How to Build Safe Clinical AI Systems

7 takeaways you can apply today

Jun 07, 2026

∙ Paid

This is Clinical Product Thinking 🧠, a weekly newsletter featuring practical tips, frameworks and strategies from the frontlines of clinical product.

Welcome, friends, this is issue No. 041 of Clinical Product Thinking. This week we're talking about what it takes to build clinical AI that patients, clinicians and health systems can trust.

You can have a clinically accurate model and still make care worse. That gap, between a model that works and a product that helps, ran through this week’s panel.

On Tuesday, Dani and I ran the second event in our panel series, this time on building safe, effective clinical AI. We covered where AI belongs in care pathways and where it fails, how to think about clinical and behavioural safety, what “human in the loop” actually solves, and why so much of the hard work happens outside the model.

Thanks to Dani for being my fabulous co-host and Dr Lucinda and Dr Paul for sharing their incredible wisdom.

Here are the seven things I’m still thinking about. 👇

1. Stop looking for places to put AI

The panel made it clear, the wrong starting question is “Where can we use AI?” It may sound reasonable, but it is wrong. The better question is “Where is the pathway currently failing?”

That might be clinical, operational, the patient experience, safety or capacity. Patients might be dropping off before finishing onboarding. Clinicians may be spending too long reviewing low-risk cases. High-risk patients might not be escalated early enough. Useful information may sit buried in free text. Or patients are asking the same question again and again because the product never made the next step clear.

Once you understand the problem, you can choose the intervention. Sometimes that’s AI. Often it’s a rule, better content, a workflow change, or a person. Don’t become a solution looking for a problem. It usually doesn’t end well!

👉 Practical step: Use this in product review

Ask the team to describe the problem without mentioning AI. “We need an AI coach” is weak. “Patients drop off after side effects because they don’t know what’s normal or when to contact us” is the actual problem. If you can’t describe the problem without naming the solution, the use case needs more work.

2. The best first use cases are usually unglamorous

As the late Grandad Rix used to say, where there’s muck there’s brass. Translation: look at the unsexy use cases first, as there’s often a lot of value to be created there.

Think about what clinicians spend time on that they were never really trained to do, or don't enjoy doing. Not the complex judgement, the high-risk decision or the sensitive conversation, but the admin around all of it: the repeated explanations, the summarising, the routing, the chasing, the hunt for missing information. The low-risk, repetitive work that keeps clinicians from spending time where their judgement and skills matter the most.

That admin is where AI helps first. The tasks are bounded, easy to check, and don’t carry the clinical decision themselves.

👉 Practical step: Start where mistakes are cheap

Pick tasks where, if the AI gets it wrong, someone notices quickly and little harm is done. A drafted message a clinician checks before sending is safe to start with. An AI that closes patient cases on its own is not.

3. A useful output still needs a pathway that can act on it

Dr Lucinda shared an example that stuck with me: an AI system that reviews chest X-rays to flag people who may be at risk of lung cancer. The model can find the signal, but the patient still needs the next step. Someone has to review the result, contact the patient, arrange the CT, track whether it happened and act if it didn’t.

A clinically correct model is not the same as better care.

The LungIMPACT trial (Nature Medicine, 2026) showed this at scale. Across five NHS trusts and more than 90,000 chest X-rays, using AI to flag and prioritise abnormal films cut reporting time from 47 hours to 34, but made no difference to time to CT, diagnosis, treatment or stage at diagnosis. The faster report had nowhere to go, because the CT slots and capacity around it never changed. The authors concluded that prioritisation alone shouldn’t be deployed without redesigning the pathway around it.

It’s like adding a spiral staircase without considering you live in a bungalow. It might look nice until you realise there’s nowhere for it to go.

👉 Practical step: Map the next five steps before shipping

Who sees the output, what are they meant to do, how fast, what happens if they do nothing, and who checks it happened? A prediction needs an owner, a flag needs a route, a risk signal needs a response time. A useful signal with no reliable route to action means the product isn’t finished.

4. Human in the loop is not enough

“Human in the loop” sounds reassuring until you look at how the loop works and consider how humans typically behave over time. A clinician reviewing AI output doesn’t automatically make a system safe.

People are influenced by what they see first. Show the AI’s answer before the clinician has formed their own view, and that view is already shaped. People also get used to systems that are usually right; after the model is correct a few hundred times, the rare mistake is easy to miss. And the workflow itself pushes behaviour. If approving the output is easy but questioning it is slow or awkward, the product quietly nudges the clinician towards agreement. That isn’t meaningful oversight.

A poor review workflow is a big approve button, a buried escalation route, and no record of why anyone agreed. A good one gives the reviewer time, context, visible uncertainty, and a clear (and easy) way to disagree.

👉 Practical step: Review the review (meta)

Ask yourself when does the clinician see the output, and what sits alongside it? How is uncertainty shown? How easy is it to disagree, and what happens when they do? How much time do they have, and what’s the default action? How is review quality monitored over time?

A human in the loop only helps if the product gives that human the space, context and authority to think.

5. Sometimes friction is a safety feature

Product teams are trained to strip out friction: fewer clicks, faster journeys, more automation. In much of healthcare that’s also true, needless friction wastes time and frustrates everyone.

But some friction can be a feature. A pause before a high-risk action, a required review before a medication change, a prompt to consider uncertainty, a design that stops someone clicking through too fast. The fastest version of a clinical workflow isn’t always the safest. The real question is what the friction is doing. Is it waste, bureaucracy, or a patch over a badly designed system? Or is it buying time for judgement, escalation or reflection? Knowing which parts of a pathway to speed up and which to slow down is part of the craft.

👉 Practical step: cut the bad friction, keep the good

Not all friction does the same job. Strip out what wastes people’s time, and keep what protects judgement.

Cut: re-entering information you already have, clicking through duplicate forms, escalation routes buried in a long workflow.

Keep: a pause before a high-risk action, a required review before a medication change, a prompt for a reason when overriding the AI.

6. Behavioural safety sits alongside clinical safety

Dr Paul is an expert in behavioural safety: how AI shapes the way people think, feel and act over repeated interactions. It is a crucial consideration for patient-facing AI.

A system doesn’t have to give obviously unsafe advice to cause harm. It can reassure a patient when escalation would be safer, encourage dependency, reinforce problematic thinking, advise without enough context or sound more certain than it has any right to. So patient-facing AI has to be evaluated for the behaviour it creates, not only its accuracy. Does the patient understand what to do next? Does the interaction support their own judgement and communicate uncertainty clearly? Does the system know when to stop and hand over, and make escalation obvious? Does it stay in its lane?

Be careful about product KPIs for utilisation. Long conversations don’t automatically result in good outcomes in healthcare. Sometimes they means the AI supported the patient well. Sometimes it means it failed to move them towards the right next step. Behavioural safety is an integral part of clinical safety. It’s time we started treating it as such.

👉 Practical step: One question for patient-facing AI

What behaviour is this interaction making more likely?

Is it making a patient more or less likely to seek further help, to trust their own judgement, to follow their care plan, to depend on the AI or to know the next safe action? Accuracy isn’t enough if the interaction nudges the patient towards an unsafe next step.

7. Safety can’t sit only with the clinician

My favourite point of the evening was about culture. Clinicians are essential to clinical AI teams, but safety can’t belong to them alone. If the clinician is the person who turns up at the end to say yes or no, you get slow teams, frustrated clinicians, and products that treat clinical risk as a final check rather than a design concern.

The best teams build shared safety instincts. Designers see how a layout might change clinical behaviour. Engineers understand why an edge case isn’t just an edge case, it’s a future clinical incident waiting to happen. Product managers spot when a shortcut introduces risk. Clinicians translate clinical nuance into product decisions early enough to shape the work. Safety becomes part of how the team thinks rather than something bolted on at the end.

This matters more with AI because the risk is rarely in one place. It can sit in the model, the interface, the pathway, the escalation route, the monitoring, the review process, or the team’s assumptions about how patients behave. It has to be designed in from the start.

👉 Practical step: Get clear on how safety shows up across the team

Clinician → Defines clinical risk, escalation thresholds and edge cases

Product manager → Works with clinician to turn risk into requirements, workflows and trade-offs

Designer → Shapes how users notice, interpret and act on risk

Engineer → Builds safe defaults, fallbacks, audit trails, monitoring

Operations → Makes escalation routes and response times work in practice

What I’m taking from this

AI is becoming part of more care pathways. Some of that will be genuinely useful, some unnecessary, and some will create risk in ways that aren’t obvious at first. The clinical product skill is telling the difference. Before adding AI, ask what problem you’re solving and who has it, what risk it creates, what happens after the output, who’s accountable when something changes, and how you’ll know if care actually improved.

That’s where safer clinical AI starts.

Join the next clinical product panel 🎤

The next clinical product panel on the clinical product gap and why healthtech needs a new kind of product leader 👉 Sign up here.

That’s all for this week. See you next time! 👋

🤝 Work with me | 📅 Attend an event | ✍️ Send a message

Written by Dr Louise Rix, Head of Clinical Product, doctor and ex-VC. Passionate about all things healthcare, healthtech and clinical product (…obviously). Based in London. You can find me on LinkedIn.

Made with 💜 for better, safer HealthTech.

For Paid Subscribers 🤩

The full 75-minute panel with Dani, Dr Lucinda and Dr Paul is below. 👇

We went deeper on all things clinical safety in AI, behavioural safety and system design.

Continue reading this post for free, courtesy of Dr. Louise Rix 👩‍⚕️.

Or purchase a paid subscription.