There’s a Thin Line Between Copilot and Backseat Driver: What Informatics Can Tell Us About Healthcare AI
By Christopher J. Kelly, Associate CMIO for Data and Analytics, MultiCare Health System
A baby girl comes to the pediatric ophthalmologist. She bounces happily in her mother’s lap but has a pronounced crossed eye. Infantile esotropia is usually a surgical condition. However, there is always a small chance of an underlying neurologic cause. The doctor asks questions about onset, progression and family history, then finds an otherwise unremarkable exam. The risk is of a brain problem is low, but how low? Is the next step surgery or an MRI?
Similar questions play out in doctors’ offices thousands of times every day. Artificial intelligence (AI) is suddenly everywhere these days, and the hype keeps building. For healthcare, an industry under constant pressure to do more with less and do it better, AI seems like the right tool at the right time. Can we use this technology to help us do a better job caring for our patients?
In the broadest sense, AI is a computer-driven supplement to human decision-making. With alerts built into our electronic medical record (EMR), we have been using this type of AI for years, although most would agree that these alerts are not very intelligent Artificial Intelligence. Recently, Large Language Models (LLMs) have achieved the success that had seemed years, if not decades away. LLMs work by predicting the next word in a sentence, and when given a hugely sophisticated algorithm and essentially all the data on the internet, they can produce output that feels very human-like. At MultiCare, a twelve hospital healthcare system in Washington State, we have been focusing a how to use this seemingly magical technology to improve performance.
We have heard the promise of technology in healthcare before. A dozen years ago, healthcare systems across the country adopted EMRs in response to the HITECH Act and Meaningful Use. Things did not go as planned. While most clinicians would not go back to paper charts, the EMR came with unintended consequences and unfulfilled promises, including the promise to help doctors make better decisions. This is exactly where AI could help. But before we rush to incorporate AI into clinical workflows, we should apply the hard-earned lessons we learned from EMR clinical decision support (CDS) implementation.
For one thing, it is not enough for an AI to be “right”. While it is impressive to see LLMs pass standardized medical exams, this alone does not make them helpful. For AI to add value and help phycians (rather than replace them) the AI must be correct when the clinician would otherwise be wrong. While we certainly make mistakes, we usually get things correct. Many EMR alerts are overridden more than 90% of the time and do little more than create a cognitive burden. If a busy doctor sees too many “I already know that” suggestions from an LLM, they will click right through them.
We don’t need AI to tell us what we already know. What we need is an AI to give us the information we need to do a better job.
Further, making a medical diagnosis is more than finding a single correct answer. Early in the process, what matters is generating a reasonable list of possible diagnoses—a differential diagnosis—and then working through that appropriately. Improving this process could reduce medical errors since doctors will not work up what they do not consider.
While premature closure is a concern, common problems are, well, common. Generating a lengthy differential is a medical student game. When the condition is straightforward, working through an exhaustive list with the extra labs and imaging studies that entails could unnecessarily increase cost and documentation burden. Again, the question is not whether the AI is right, but whether it adds value.
What could be helpful is an AI to help us identify those uncommon diagnoses we otherwise would have otherwise missed. But the laws of statistics make it hard to predict rare events. When the probability of a diagnosis is very low, even an accurate test still results in many false positives. When an AI tries to predict rare diagnoses, we can expect a lot of useless alerts, which could even be harmful if they lead to unnecessary invasive tests.
There are some ways AI could help. One is as a consultant: “Hey AI, can you read this patient’s chart and see if I’m missing anything?” Rather than firing unhelpful suggestions, an on-demand AI might add value in situations when a doctor is uncertain. The doctor would need to think to ask the AI, and once the novelty wears off, they would need to get valuable insight consistently, not just recommendations to order more low-yield tests.
AI, in its current form, may struggle to add value, but it will not be in its current form for long. One active area of development is retrieval augmented generation, where the AI uses its understanding of language to query a separate data source. Rather than just a differential diagnosis, LLMs could find information on the appropriate work up of a condition and the cost of each test. Knowing the most cost-effective way to work up a problem, one that minimizes both cost and risk to the patient, would not only help us provide better care, but improve efficiency. Instructing the LLM to limit its responses to data in the database could even reduce the risk of hallucinations. We don’t need AI to tell us what we already know. What we need is an AI to give us the information we need to do a better job.
Doctors do not routinely access risk and cost databases, but the data are there. The limiting factor has been integration into clinical workflows. LLMs, with their ability to make sense of clinical scenarios, may be the bridge that allows doctors to make truly informed clinical decisions. The real benefit of AI may come not by supporting current processes, but by helping us do things differently and better. What is the risk of a brain problem in a baby with strabismus? The doctor and family may be able to make a data-driven decision.