- In a recent research, it was discovered that ChatGPT exhibited a 72 percent precision rate within medical contexts.
- Nevertheless, medical professionals maintain an approximately 95 percent accuracy level according to the studies.
According to a study, ChatGPT's proficiency in accurately diagnosing patients and making clinical judgments is on par with that of resident doctors.
Researchers at Mass General Brigham in Boston, Massachusetts, conducted an investigation into the AI chatbot's capabilities in correctly identifying patient conditions and overseeing their treatment in primary care and emergency scenarios.
ChatGPT displayed accurate decision-making concerning diagnoses, appropriate medication prescriptions, and other therapeutic measures around 72 percent of the time. In contrast, medical professionals are estimated to achieve an accuracy rate of approximately 95 percent.
The researchers indicated that they "approximate this level of performance to be comparable to an individual who has recently completed medical school, like an intern or resident."
While qualified doctors exhibit notably lower rates of misdiagnosis, the researchers suggest that ChatGPT could enhance medical accessibility and alleviate appointment wait times.
Dr. Marc Succi, the lead author of the study, commented, "Our research provides a comprehensive evaluation of decision support through ChatGPT, spanning from initial patient interactions to complete care scenarios involving differential diagnosis, testing, final diagnosis, and treatment management."
"Although there are no direct benchmarks, our estimation places this performance on par with someone who has recently completed medical school, like an intern or resident."
"This underscores the potential of Large Language Models (LLMs) to serve as valuable tools in medical practice, significantly enhancing clinical decision-making with remarkable precision."
In the study, ChatGPT was tasked with suggesting potential diagnoses for 36 cases, taking into account patient characteristics, symptoms, age, gender, and whether the situation was an emergency.
Subsequently, the chatbot was provided with additional information and tasked with formulating decisions regarding patient care, in addition to offering a final diagnosis.
Overall, the platform achieved a 72 percent accuracy rate. It excelled particularly in delivering a conclusive diagnosis, demonstrating a 77 percent accuracy rate.
Its weakest aspect was generating differential diagnoses, where symptoms align with multiple conditions and must be narrowed down to a single conclusion. In this category, ChatGPT achieved a 60 percent accuracy rate.
The platform also displayed a 68 percent accuracy rate in making decisions about care management, including determining suitable medication prescriptions.
Dr. Succi remarked, "'ChatGPT struggled with differential diagnosis, which is the meat and potatoes of medicine when a physician has to figure out what to do."
"This is significant as it highlights the specific areas where physicians excel and provide the greatest value—particularly in the initial stages of patient care when limited information is available and a list of potential diagnoses is required."
An earlier study published this year revealed that ChatGOT achieved scores ranging from 52.4 to 75 percent on the three-part United States Medical Licensing Exam (USMLE), the country's gold-standard medical assessment. The passing threshold for the exam is around 60 percent.
Nonetheless, ChatGPT hasn't quite surpassed the proficiency of qualified doctors just yet.
Research indicates that real doctors misdiagnose patients at a rate of five percent. This translates to one in 20 patients. In contrast, ChatGPT misdiagnosed one in 4 patients.
The most accurate case in the ChatGPT study involved a 28-year-old man with a mass on his right testicle, where ChatGPT achieved an 83.8 percent accuracy rate. The accurate diagnosis was testicular cancer.
On the other hand, the least accurate case achieved a 55.9 percent accuracy rate for a 31-year-old woman with recurring headaches. The correct diagnosis was pheochromocytoma, a rare and usually benign tumor that forms in the adrenal gland.
Across all age groups and genders, the average accuracy remained consistent.
Published in the Journal of Medical Internet Research, this study builds upon prior healthcare-related research involving ChatGPT. A study from the University of California San Diego found that ChatGPT provided answers of higher quality and greater empathy compared to actual doctors.
The AI demonstrated empathy in 45 percent of instances, whereas only five percent of doctors exhibited empathy. Furthermore, ChatGPT delivered more detailed responses 79 percent of the time, compared to the 21 percent achieved by doctors.
Moreover, ChatGPT was preferred 79 percent of the time, as opposed to the 21 percent preference for doctors.
The researchers at Mass Brigham expressed the need for additional studies, but the results are promising.
Dr. Adam Landman, a co-author of the study, stated, "We are presently assessing LLM solutions aimed at aiding clinical documentation and generating patient message responses, focusing on their accuracy, dependability, safety, and fairness."
"Thorough studies like this one are essential before integrating LLM tools into clinical practice."


%20(1)-Photoroom.png)