SA1 - Comparison of AI Models (ChatGPT and Claude AI) With Human Accuracy in Evaluating the Relationship Between Mandibular Third Molar and Inferior Alveolar Nerve Using Panoramic Radiography
Friday, September 19, 2025
9:00 AM - 9:10 AM EDT
Location: Room 204C, Walter E. Washington Convention Center
Abstract: Statement of the Problem: The relationship between the mandibular third molar (M3) and the inferior alveolar nerve (IAN) is crucial in determining surgical risks during third molar extractions. Panoramic radiography is commonly used for preoperative assessment, but its interpretation is subjective and varies among oral and maxillofacial surgeons. With advancements in artificial intelligence (AI), automated image analysis has the potential to improve diagnostic accuracy and reduce observer variability. This study compares the performance of two AI models, ChatGPT and Claude AI, with human expert evaluations in classifying M3-IAN relationships. Materials and
Methods: A retrospective analysis was conducted on 200 anonymized panoramic radiographs of patients aged 18–40 years with fully or partially erupted mandibular third molars. Images were classified into four categories: true contact, intimate, buccal position, and lingual position. Two OMS specialists independently assessed the radiographs, with discrepancies resolved by a third expert. AI models, ChatGPT and Claude AI, were trained and tested to classify the same images. The study evaluated performance differences between AI models and human experts using standardized classification methods. Methods of Data Analysis: The performance of AI models and human evaluators was assessed using accuracy, precision, recall, and F1-score. Cohen’s kappa coefficient (κ) measured inter-rater agreement. A chi-square test compared categorical classification outcomes between AI and human raters. A paired t-test was used to compare AI and human performance metrics. Statistical analysis was performed using SPSS 30.0, with a significance level set at P < .05. Results and Outcomes Data: A total of 200 panoramic images of M3 were analyzed, with 112 classified as true contact, 88 as intimate, 137 as IAN buccal position, and 63 as IAN lingual position. AI models demonstrated an accuracy of 71%, precision of 82%, recall of 60%, and an F1-score of 71% in determining M3-IAN relationships. Human evaluators showed an accuracy range of 60% to 71%. Cohen’s kappa coefficient indicated substantial agreement for AI (κ = .61) and substantial agreement among human experts (κ = .72). The mean accuracy for AI models was 72.5% (SD = 4.2%), while for human evaluators, it was 70.8% (SD = 5.1%). The difference was not statistically significant (t = 1.32, P = .21). Similarly, the mean F1-score for AI models was 0.75 (SD = .06), compared to 0.72 (SD = .07) for human evaluators, with a non-significant difference (t = 1.10, P = .27). These results suggest that AI models perform comparably to human experts in assessing the M3-IAN relationship on panoramic radiographs. The chi-square test showed a non-significant difference (P = .28).
Conclusion: The findings highlight the complementary potential of AI technologies in radiographic diagnostics. This study emphasizes that AI serves as an augmentation tool for human expertise rather than a replacement. The comparable performance between AI models and OMS specialists suggests that integrating AI into clinical workflows can enhance diagnostic efficiency and reduce observer variability. A combined approach, leveraging AI’s computational accuracy alongside the clinical judgment of experienced professionals, has the potential to improve patient care outcomes in third molar evaluations and surgical planning.
References: 1. Choi E, Lee S, Jeong E, et al. Artificial intelligence in positioning between mandibular third molar and inferior alveolar nerve on panoramic radiography. Sci Rep. 2022;12(1):2456. Published 2022 Feb 14. doi:10.1038/s41598-022-06483-2 2. Stephan D, Bertsch A, Burwinkel M, et al. AI in Dental Radiology-Improving the Efficiency of Reporting With ChatGPT: Comparative Study. J Med Internet Res. 2024;26:e60684. Published 2024 Dec 23. doi:10.2196/60684