Chatbots Do Not Perform Well for Misinformation-Prone Health Topics

In an age where information is just a click away, the rise of AI-powered chatbots has revolutionized how we seek knowledge. These intelligent conversational agents, designed to mimic human interaction, are increasingly being used to answer a wide array of questions, including those related to health. However, a recent study published in BMJ Open casts a shadow of doubt on the reliability of chatbots when it comes to addressing health topics notoriously prone to misinformation. The findings reveal a concerning trend: nearly half of the responses generated by popular chatbots to health-related queries were found to be problematic, raising serious questions about their suitability as trusted sources of medical information. This revelation has significant implications for individuals who rely on these tools for health advice, as well as for the broader landscape of digital health and misinformation.

Led by Dr. Nicholas B. Tiller from Harbor-UCLA Medical Center, the research team embarked on a comprehensive audit of chatbot responses to a diverse set of health questions. Their objective was to assess the accuracy, completeness, and reliability of information provided by these AI models, particularly in areas where misinformation tends to flourish. To achieve this, they meticulously crafted ten questions across five critical health categories: cancer, vaccines, stem cells, nutrition, and athletic performance. These categories were chosen due to their inherent complexity, the often-conflicting information surrounding them, and their potential to be exploited by purveyors of false or misleading health claims. The questions were then posed to five prominent chatbots: Google’s Gemini, High-Flyer’s DeepSeek, Meta AI, OpenAI’s ChatGPT, and xAI’s Grok. This selection encompassed a broad spectrum of AI models, representing different developmental stages and underlying architectures, thus providing a comprehensive snapshot of the current state of chatbot capabilities in this domain.

The results of the study painted a disquieting picture. A staggering 49.6% of the chatbot responses were categorized as problematic, with 30% being “somewhat problematic” and a startling 19.6% deemed “highly problematic.” This means that for nearly every other health-related question, users were likely to receive information that was either incomplete, inaccurate, or potentially harmful. What’s more, the quality of responses exhibited a disconcerting uniformity across the different chatbots, with no significant statistical difference observed (P = 0.566). This suggests a systemic challenge in how these AI models process and present health information, rather than an isolated issue with a particular chatbot. However, one outlier did emerge: Grok, from xAI, generated significantly more highly problematic responses than would be expected by chance. This finding raises specific concerns about Grok’s performance in handling health inquiries, particularly those susceptible to misinformation, and warrants further investigation into its training data and algorithmic biases. The implications of such widespread inaccuracies are profound, as they can lead individuals to make ill-informed decisions about their health, potentially exacerbating existing conditions or delaying necessary medical interventions.

Delving deeper into the categorical breakdown, the study unearthed interesting patterns in chatbot performance across different health topics. Chatbots demonstrated their strongest performance when addressing questions related to vaccines (mean z-score, –2.57) and cancer (–2.12). This could be attributed to the vast amount of scientific research, readily available expert consensus, and continuous efforts by global health organizations to disseminate accurate information on these critical areas. However, the picture shifted dramatically in other categories. Performance was weakest in stem cells (+1.25), athletic performance (+3.74), and nutrition (+4.35). These areas are often characterized by evolving scientific understanding, conflicting dietary trends, and a proliferation of anecdotal evidence, making them fertile ground for misinformation. The chatbots’ struggles in these domains highlight their difficulty in discerning credible sources from unsubstantiated claims, emphasizing the need for robust mechanisms to filter out misinformation and prioritize evidence-based information. This differential performance underscores that while chatbots may excel in areas with clear scientific consensus, their reliability wanes in more nuanced and rapidly evolving fields.

Beyond the content itself, the quality of references provided by the chatbots was a significant cause for concern. The study found that the median completeness score for references was a dismal 40%, indicating a severe lack of comprehensive and verifiable sources. Even more alarming was the revelation that no chatbot produced a fully accurate reference list. This deficiency was primarily attributed to “hallucinations” – instances where chatbots fabricated citations or generated information that had no basis in reality – and fabricated citations, where non-existent studies or authors were referenced. This practice of manufacturing sources undermines the very foundation of scientific discourse and can lead users down rabbit holes of non-existent research. Furthermore, the readability of the responses was consistently graded as “difficult,” equivalent to a college sophomore-senior level. This high reading level raises concerns about accessibility, particularly for individuals with limited health literacy or those who may not have a strong academic background. The combination of inaccurate references and complex language further erects barriers to understanding and critically evaluating the information presented, increasing the likelihood of misinterpretation and misguided health decisions.

The authors eloquently summarize the underlying limitations of current chatbot technology, stating that “By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences.” This fundamental characteristic underscores a critical distinction: chatbots do not “reason” or “weigh evidence” in the human sense. They lack the capacity for ethical or value-based judgments, which are integral to providing holistic and responsible health advice. This behavioral limitation means that chatbots are inherently prone to reproducing “authoritative-sounding but potentially flawed responses.” Their impressive linguistic fluency can mask deep-seated inaccuracies, making it challenging for users to distinguish between credible and unreliable information. This study serves as a stark reminder that while chatbots offer immense potential as information retrieval tools, their current iteration falls short when it comes to complex and sensitive topics like health. Relying solely on chatbots for medical advice, particularly in areas susceptible to misinformation, carries significant risks. The human element of critical thinking, nuanced understanding, and ethical consideration remains indispensable in the realm of health information, highlighting the ongoing need for human oversight and expert judgment.

Trending

Disinformation diplomacy: How malign actors are seeking to undermine democracy: follow-up – Committees

Weekly Wrap: Misinformation On NEET 2026 Row, CM Vijay & More

‘Totally False’: PM Modi Fact Checks Report Suggesting Govt Mulling Tax On Foreign Travel

Weekly Wrap: Misinformation On NEET 2026 Row, CM Vijay & More

Science World Facebook page spreads misinformation about Shetland’s energy network

‘Bangla Pokkho’ founder Garga Chatterjee arrested over alleged EVM misinformation ahead of Bengal polls

Misinformation about hantavirus is spreading on social media – WAMU

AI May Do More Than Spread Misinformation, It Can Make Peop

When ADHD Goes Viral: Social Media and Misinformation

Weekly Wrap: Misinformation On NEET 2026 Row, CM Vijay & More

‘Totally False’: PM Modi Fact Checks Report Suggesting Govt Mulling Tax On Foreign Travel

Science World Facebook page spreads misinformation about Shetland’s energy network

New Report On Disinformation Impacts On the Charity And Voluntary Sector

‘Bangla Pokkho’ founder Garga Chatterjee arrested over alleged EVM misinformation ahead of Bengal polls

Turkish base fake news in Armenia / JAMnews

Yungblud shrugs off false narratives: “I am a middle-class kid”

Claims of 150% vehicle price hike false: Deputy Finance Minister – Breaking News

Trending

Chatbots Do Not Perform Well for Misinformation-Prone Health Topics

Keep Reading