Our Take Newsletter

Darwin's Our Take: Do AI chatbots pose risks for consumers and providers?

April 20, 2026

Do AI chatbots pose risks for consumers and providers?

As more and more consumers turn to generative AI-driven chatbots for medical advice, a new study suggests that about half of the chatbots’ responses to evidence-based questions are “somewhat” or “highly” problematic.

The study, published in the April issue of BMJ Open, was conducted by researchers in the U.S., Canada, and the U.K.

The research team audited responses from five chatbots: ChatGPT (OpenAI), DeepSeek (High-Flyer), Gemini (Google), Grok (Elon Musk’s xAI), and Meta AI (Meta).

The team asked each of the chatbots 10 open- and closed-ended questions from five health and medical categories: cancer, vaccines, stem cells, nutrition, and athletic performance. These areas were selected in part because they are “domains in which misinformation is recurrent and widely disseminated.“

Two experts from each category rated the responses as “non-problematic,” “somewhat problematic,” or “highly problematic.” The information was scored for accuracy, completeness of references, and readability.

Almost half of the responses were deemed problematic (30% “somewhat problematic” and 19.6% “highly problematic”).

The chatbots performed better in the vaccine and cancer categories than in the other three.

Of concern, the researchers said the chatbots consistently expressed their responses “with confidence and certainty.” Further, they said the reference quality was poor and readability was difficult.

A separate study led by researchers at Mass General Brigham and published April 13 in JAMA Network Open appears to support these findings, though this study focused on AI chatbot use from a clinical standpoint.

This research team used 29 standardized case scenarios taken from the MSD Manual to evaluate 21 AI models, including ChatGPT, Claude (Anthropic), DeepSeek, Gemini, and Grok. The analysis generated more than 16,000 responses.

The AI models were asked to provide differential diagnoses (i.e., to generate a list of possible diagnoses based on symptoms) and then order diagnostic tests, make a final diagnosis, and manage treatment.

The researchers found that failure rates across all models frequently exceeded 80% on the differential diagnosis metric.

Failure rates across all models were less than 40% for final diagnosis. When the models were given more information, the best-performing models had failure rates as low as 9%.

The team cautioned that the models’ “most responsible role today is targeted, clinician-supervised use in low-uncertainty tasks. Without such constraints, premature deployment could lead to individual errors in patient care that, when compounded, erode the systems of clinical reasoning that protect patients from harm.”

Fortunately, the majority of consumers who seek health information through the use of AI chatbots (and social media) are skeptical of the how accurate the information is, a survey by the Pew Research Center indicates. Many respondents said they turn to these sources because they’re convenient.

Roughly one in five (22%) of the respondents said they get health information from AI chatbots “at least sometimes,” with 7% saying they do so “often” or “extremely often.”

Nearly half of those who said they use AI chatbots for health information (48%) said it was “extremely” or “very” convenient, and 41% said it was “extremely” or “very” easy to understand (apparently conflicting with the findings from the study published in BMJ Open).

When asked about accuracy, however, just 18% said they thought the information was “extremely” or “very” accurate. Still, 58% found the information they received to be “somewhat” accurate.

Meanwhile, a report from the Peterson Health Technology Institute suggests that health systems and health insurers’ use of AI tools for administrative tasks such as prior authorization and medical billing/coding “risks increasing levels of system activity without reducing costs.”

In fact, the report indicates that provider deployment of AI “is increasing billing intensity and inflating medical spending.”

A researcher at the University of Pennsylvania also says the use of AI could increase health care costs unless payment models are revamped.

Dr. Amol Navathe, a senior fellow at the Leonard Davis Institute of Health Economics at UPenn was quoted in a blog post as saying:

“In … situations where AI increases productivity without necessarily improving accuracy or outcomes, or when AI can perform services autonomously, current payment paradigms that reimburse based on human labor inputs like time and skill just don’t fit well. This may lead to overspending and overuse, which must be balanced with the health benefits of access to the technologies.”

In other AI-related news, AWS launched an AI-powered application called Amazon Bio Discovery that was developed to “help scientists design and test novel drugs more quickly and confidently.”

The tool gives scientists access to specialized AI models known as biological foundation models that generate and evaluate drug candidates. The tool also includes an AI agent that can assist with a variety of tasks throughout the drug discovery and development process.

The launch announcement provides a comprehensive explanation of the app’s capabilities.

OUR TAKE: It’s important to note that the AI chatbot study published in BMJ Open was conducted in February 2025. Since then, chatbots have almost certainly become more sophisticated, though that doesn’t necessarily mean they’ve become more accurate.

Plus, two of the companies whose AI chatbots were included in the study — OpenAI and Meta — recently introduced health-specific AI chatbots.

The study’s researchers acknowledged that they deployed an “adversarial-like framework” with prompts designed to “strain” models toward misinformation or contraindicated advice. Their goal was to stress-test the chatbots and identify behavioral vulnerabilities.

“By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences,” the study authors wrote. “They do not reason or weigh evidence, nor are they able to make ethical or value-based judgments.”

“This behavioral limitation means that chatbots can reproduce authoritative-sounding but potentially flawed responses,” the authors added.

Compounding the problem, the research team pointed out that chatbots draw data from online Q&A forums, such as Reddit, and social media. Moreover, the scientific content the chatbots are able to access is generally limited to open access or publicly available studies, leaving out as much as 70% of peer-reviewed, published studies.

HCR #204: A Conversation with David Snow, CEO, Cedar Gate Technologies

Value-based care has been the goal for decades. So why does it still feel like we're moving in slow motion? The answer may come down to data, incentives, and the willingness to cross what one industry veteran calls "the rickety bridge over a chasm of death." David Snow, chairman and CEO, Cedar Gate Technologies, joins John to discuss why technology has finally caught up to the promise of value-based care and what it will take to move American health care past the tipping point. Watch here or listen to Health Care Rounds on your favorite podcast platform.

What else you need to know

CMS has accepted more than 150 health care organizations for participation in the Advancing Chronic Care with Effective Scalable Solutions (ACCESS) Model, a voluntary, 10-year, performance-based payment model announced in December. The goal of ACCESS is to expand access to technology-supported care for chronic conditions such as hypertension, diabetes and depression. CMS extended the deadline to submit a participation application to May 15; the model is scheduled to launch on July 5. Applications received after the deadline will be considered for participation starting Jan. 1, 2026. A list of the accepted applicants is available here.

CMS also intends to expand the Comprehensive Care for Joint Replacement (CJR) Model, an episode-based payment model that ran from 2016 through the end of 2024. The extended model, referred to as CJR-X, would be a mandatory, nationwide program and would add ankle replacements to the hip and knee replacements included in the original model. If the proposed extension is finalized, CJR-X will start Oct. 1, 2027. Get more details here.

And CMS held an event to demonstrate progress on the Health Tech Ecosystem, an initiative introduced in July encouraging health care organizations and innovators to improve data sharing capabilities and interoperability. According to the agency’s press release, CMS highlighted digital tools from more than 50 companies that either are already accessible or soon will be available to the public.

Eli Lilly released top-line trial results that appear to address an FDA request for more data on the potential for drug-induced liver injury (DILI) with the use of Foundayo (orforglipron), Lilly’s recently approved GLP-1 pill. In the April 1 approval letter, the FDA said Lilly would have to provide an assessment of potential DILI after ACHIEVE-4, a Phase III study of Foundayo in people with type 2 diabetes, is completed this month. The FDA also requested a clinical study report on adjudicated major adverse cardiovascular events (MACE).

The top-line results from ACHIEVE-4 showed that Foundayo met the trial’s primary endpoint by demonstrating a non-inferior risk of MACE-4 compared with insulin glargine, according to an April 16 press release. Lilly noted that the trial included “a thorough analysis of potential DILI” and the “analyses confirmed there was no hepatic safety signal.” It’s not clear whether Lilly has submitted the report the FDA requested, but the company said it would submit an approval request for the diabetes indication by the end of this quarter.

Edison, N.J.-based Hackensack Meridian Health is opening a health and wellness center at Metropark, New Jersey’s second-busiest train station, making it the nation’s first health center at a major mass transit hub, New Jersey Business Magazine reported. By offering extended hours at the center, Hackensack Meridian hopes to attract commuters on their way to and from work. The center will offer services such as advance imaging, urgent care, primary care, surgical and medical specialties, retail pharmacy, physical and occupational therapy, rehabilitation, and phlebotomy.

Hackensack Meridian also plans to move its network headquarters into the upper floors of the building where the center is located so that corporate employees who have been working at several buildings will be in the same place.

Baylor, Scott & White Health Plan will leave the Medicaid managed care market by the end of August, Becker’s Hospital Review reported, and discontinue individual marketplace plans at the end of the year, according to a statement published on its website. The changes will affect approximately 125,000 Medicaid enrollees in Texas and about 100,000 people who have coverage through the marketplace, Dallas Morning News reported. The insurer, a business unit of Baylor, Scott & White Health, will continue to offer employer group and Medicare Advantage plans.

Dr. Janice Nevin, CEO of ChristianaCare, will retire on Sept. 1, the Wilmington, Del.-based health system announced. Dr. Nevin has been with ChristianaCare for more than 23 years and has served as president and CEO since 2014. Jenn Schwartz, the health system’s executive vice president and chief strategy officer, has been selected as Dr. Nevin’s successor.

Dr. Erica Schwartz has been nominated to lead the Centers for Disease Control and Prevention. The agency has been without a permanent director since Dr. Susan Monarez was terminated in August after a very brief tenure. Dr. Schwartz, who holds medical and law degrees, previously served as chief medical officer of the U.S. Coast Guard and as deputy surgeon general in the first Trump administration.

What we’re reading

The First AI Drug Prescriber. JAMA, 4.13.26 (subscription or registration required)

Powering Value-Based Care with Clinician-Led, Patient-Centered Design: The MyHealth Bundles Program. NEJM Catalyst (abstract available; subscription required for full article access)

Drug Prices, Behavioral Health, Hospitals, And More. Health Affairs, April 2026 (free access to issue overview)

Share this post

retatrutide

new tag

leadership

specialty pharmacy

Stay Aware. Stay Informed.

Subscribe to Our Take

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Explore Content

Darwin's Our Take: Do AI chatbots pose risks for consumers and providers?

Subscribe to Our Take

Services

Company

Explore Content

Darwin's Our Take: Do AI chatbots pose risks for consumers and providers?

Recommended Next

Darwin's Our Take 6.1.26: Sutter and Allina Health advance plans to combine

Darwin's Our Take 5.18.26: UnitedHealth’s Optum Rx PBM to shift to transparent, fee-based model

Darwin's Our Take 5.11.26: UPMC to acquire Trinity Health from CommonSpirit, WakeMed to merge with Advocate’s Atrium Health

Subscribe to Our Take