Three Layers of Bias in Clinical AI
Why bias does not stop at the model and how it compounds through clinician and workflow.
This is Clinical Product Thinking 🧠, a weekly newsletter featuring practical tips, frameworks and strategies from the frontlines of clinical product.
Welcome, friends, this is issue No. 032 of Clinical Product Thinking. This week, we’re diving into ways biases are created and propagated in clinical AI systems.
When we talk about bias in clinical AI, people are usually referring to the model. Specifically, biased training data, poor representation of the dataset to the intended population and algorithms that perform worse for some groups than others.
This is a huge problem. But it is only part of the picture.
Bias in clinical AI exists across multiple layers:
Model and data bias
Cognitive bias
System and workflow effects
A recent review in PLOS Digital Health argues that bias in clinical AI should be thought of as cumulative rather than isolated. Bias can enter at every stage and each compounds the one before it.
By the time an AI system reaches the clinician, the final output may reflect multiple layers of distortion rather than a single flaw in the model.
A clinical AI model can be statistically fair, rigorously validated and still cause harm. Why?
Because bias in clinical AI does not live in one place. It accumulates.
First in the data. Then in the clinician’s mind. Then in the workflow around it.
Layer 1: Model and Data Bias
This is typically top of mind for most people. Clinical AI models inherit the biases embedded in the data they are trained on.
That includes:
Historical bias: when past inequalities in healthcare become encoded in the model
Representational bias: when some groups are under-represented in the training data
Clinical AI datasets are still remarkably unrepresentative. Studies show more than half of published clinical AI models are trained on data from either the US or China, and many overrepresent White patients relative to the populations they are ultimately used on. A model may therefore appear accurate while performing far worse for minority groups, different health systems or lower-resource settings.
Measurement bias: when the labels or proxies used do not accurately reflect reality
Labels themselves are often treated as “ground truth”, but they frequently reflect human judgment rather than objective reality. Diagnoses, triage decisions and even treatment recommendations can encode the cognitive biases and unequal care patterns of clinicians. If a model is trained on those labels, it may not just reproduce the bias, but scale it.
Aggregation bias: when one model is applied across groups that differ clinically
Deployment bias: when a model is used outside the setting it was validated for
The Epic Sepsis Model is one example. A model was deployed as part of the Epic EHR to flag early signs of sepsis. However, when researchers independently evaluated it, performance varied significantly between hospitals and was poorest in patients with multimorbidity and cancer. Epic subsequently began recommending that hospitals train the model on their own patient data before clinical deployment.
These biases are well documented. But even if the model is technically fair and accurate, the story does not end there. The moment a human encounters an AI recommendation, a new layer of bias begins.
Layer 2: Cognitive Bias
Once a clinician sees an AI recommendation, the model begins to shape human thinking.
The most obvious is automation bias, the tendency for humans to over-rely on automated systems, trusting AI suggestions over their own judgment, even when the system is incorrect or contradicts available evidence.
A randomised controlled trial showed that when physicians were shown an incorrect LLM output, diagnostic accuracy significantly degraded compared to error-free advice. This occurred despite prior AI-competency training. Diagnostic accuracy dropped from 85% to 73%.
But that is only one of several cognitive traps.
Clinicians may also experience:
Authority bias: the AI feels authoritative, so concerns are ignored
Confirmation bias: we trust outputs that agree with what we already think
Base-rate neglect: we over-weight the AI’s prediction and under-weight how common the disease actually is
Many companies design systems with human-in-the-loop as a key risk mitigation without realising they have also designed a system that can systematically distort the human judgment they are relying on.
The imperative is therefore not just to improve model accuracy but to design interfaces that reduce cognitive bias.
That means asking:
Should the AI output appear before or after the clinician makes an initial assessment?
How should uncertainty be displayed?
Should the system show alternative possibilities?
When should the AI force the user to slow down or escalate?
Layer 3: System and Workflow Effects
And then there is the next layer: the wider system around the model which itself can push people towards poor decisions.
That includes:
Alert fatigue: clinicians begin to ignore warnings because there are too many of them
Alert fatigue is one of the most well-documented workflow failures in clinical AI. Studies have shown clinicians override between half and 96% of clinical alerts. In primary care, clinicians receive more than 50 alerts a day. At that point, the problem is no longer whether the model is technically correct. The problem is that no human can realistically respond to that volume of interruption.
Deskilling: clinicians may become progressively less able to make decisions independently because they have become used to relying on AI
Over time, repeated reliance on AI can erode clinical judgement and reduce clinicians’ ability to work without it. One study found that after clinicians began using AI-assisted polyp detection, their unassisted detection rates declined.
Timing effects: the same AI output can lead to different decisions depending on when it appears in the workflow
An AI recommendation shown before a clinician has formed an initial judgement may anchor their thinking and make them less likely to challenge it. The same recommendation shown later, after an independent assessment, may be interpreted much more critically.
Studies suggest that the timing of AI input relative to the clinical encounter affects performance independently of the model itself.
Workflow friction: clinicians are less likely to act on AI recommendations when doing so creates interruption, complexity or when they have poor workflow integration
Studies have shown a model may be technically accurate, but still fail if it creates extra clicks, interrupts at the wrong moment, requires the clinician to leave their usual workflow or adds more work than it saves. Over time, people begin to ignore or bypass the system altogether.
This is why so many clinical AI pilots can look impressive in a demo and not perform well in practice.
Why This Matters
The critical point is that these layers are cumulative. Model and data bias are the foundation, cognitive bias amplifies, and then workflow design either mitigates or magnifies the problems.
By the end, harm may have very little to do with the original model:
Imagine a model that slightly under-predicts sepsis risk in one population. (Model bias)
Next, imagine clinicians begin over-trusting the score and stop questioning it. (Cognitive bias)
Now, imagine the system surfaces those alerts constantly until clinicians either stop escalating or escalate everyone. (Workflow effect)
The result is a much larger failure than the original model error alone.
The Future of Clinical AI Will Be Won in the Workflow
This is why I increasingly think that the future of clinical AI is not about building smarter models. It is about building smarter systems around them.
The teams that succeed will not only ask:
Is the model accurate?
They will ask:
How could this output distort human judgment?
What biases might this interface create?
What happens when the AI is wrong?
How do we design the workflow to catch that?
And that is why clinical product management matters. Because the hardest problems emerge in the messy space that CPMs operate in, between the product, the clinician and the workflow.
Learn More 👩🏫
Looking to learn more about building safe clinical AI systems? Here are a few resources I recommend:
Read the Arise report on the State of Clinical AI 2026
Join this webinar on how clinicians are building with Claude Code (I’ll be there!)
Keep an eye out for the next Openclaw Clinical Hackathon
Join the Clinical AI Interest Group by Alan Turing Institute
Take this course on fairness in human-AI interactions in healthtech
Hiring Spotlight 🚀
Heidi are hiring a clinical associate to join the customer success team. While not a pure play clinical product role this would be a great way to gain startup experience before pivoting into a more pure play CPM role. 👉Apply here.
HealthHero are hiring for 2 x clinical product specialists to join their team. HealthHero is one of Europe’s largest digital health provider and they’re working on some incredibly innovative products. 👉Apply here.
Join Us at HLTH Europe 🇪🇺
Danielle Brightman and I are running a panel event on clinical product with two incredible guest speakers. If you don’t know about HLTH, it’s the health tech conference you absolutely cannot miss.
👉 Register your interest for the panel here.
🎟️ Get your HLTH ticket here. (Use code: HE26PP_CPT250 for €250 off your ticket!)
That’s all for this week. See you next time! 👋
🤝 Work with me | 📅 Attend an event | | ✍️ Send a message
Written by Dr.Louise Rix, Head of Clinical Product, doctor and ex-VC. Passionate about all things healthcare, healthtech and clinical product (…obviously). Based in London. You can find me on Linkedin.
Made with 💜 for better, safer HealthTech.



