I think you should first start with definition of software safety in health domain. For example, you should start with Therac-25 accident.Then look at the current scientific articles and standards about software safety in medical domain. Then think about how your algorithm will be tested.
You are thinking Deep RL algorithms as a blackbox but they are software in the end.If Deep RL algorithms will be used in hospitals, they will have to be tested. The benchmarks, conditions and restrictions of normal software must apply to RL algorithms too.