AI May Enhance Clinical Skills, Competence

On January 23, 2026

Artificial intelligence (AI) is getting better at assessing clinical skills and maybe even augmenting competence, but when it comes to bedside encounters, it may not be the answer, according to plenary speakers for the Lois Margaret Nora Endowed Lecture at ABMS Conference 2025.

Measuring competence

Dr. Daglius Dias

Roger Daglius Dias, MD, PhD, MBA, an Associate Professor of Emergency Medicine at Harvard Medical School and Director of Research and Innovation at the STRATUS Center for Medical Simulation at Brigham and Women’s Hospital in Boston, where he leads the Medical AI & Cognitive Engineering Lab, is convinced that AI machine learning can help assess physician competence.

Dr. Dias noted that some current assessment methods – knowledge tests, self-assessments, peer reviews, and observation, among others – can be challenging to validate, especially when assessing higher skill levels. Researchers grapple with proving whether these assessments measure what was intended and how they impact patient-centered outcomes and care, he said. In contrast, other fields such as sports are already using data to drive performance improvement. But trying to measure, for example, “teamwork in the operating room” is not as easy as measuring if an elite athlete is ready to train tomorrow, Dr. Dias explained.

Early-stage studies applying machine learning to competence assessment, including a 2017 systematic review he co-authored, were designed to prove feasibility of the technology. New advances in technology, especially AI, are just now allowing researchers to tap into measuring complex skills and competencies in a more objective and scalable way, he said.

Dr. Dias cited numerous studies he is currently working on that focus on validity and reliability of AI machine learning to assess physician competence.

His team is collaborating with the American Tennis Association on a National Institute of Health (NIH) five-year project to learn how elite athletes measure performance and coach performance to improve and apply those methods to surgeons. Using cameras and wearable sensors in the operating room (OR), AI is capturing surgeons’ motions, team dynamics, and cognitive load.

Another NIH study is using AI to build a coaching system to improve surgical performance in urologic endoscopy, collecting data from the simulation environment and the OR. “We need to translate findings from a simulation environment to a real-world environment where many of the variables are completely different,” Dr. Dias stated.

Other projects are using AI to measure levels of expertise, competence, and teamwork and, in some cases, trying to tie these to patient outcomes. These projects are moving toward a more scalable way of measuring competencies and skills in the OR, he said.

A cross-sectional study of 30 cardiac surgical procedures found certain patterns of team members’ motion in the OR to positively correlate with higher non-technical skills performance. This study demonstrated the feasibility of automatically assessing an OR team’s non-technical skills through deep learning–based analysis of surgical videos, Dr. Dias stated. These findings could be used for surgical education and improvement efforts.

Dr. Dias maintains that it is possible to have a data driven approach to assess competence in medicine. “We are at a pivotal point in time when advancements of AI machine learning will allow us to create more objective tools that can help us assess competence more consistently and at scale,” he concluded.

Augmenting competence

Dr. Meireles

Ozanan R. Meireles, MD, FACS, Associate Professor of Surgery at Duke University School of Medicine, Vice Chair for Innovation in the Department of Surgery, Surgical Director at Duke AI Health, and Director of the Surgical Artificial Intelligence and Innovation Laboratory, is working to establish a framework that will enable AI, including agentic, generative, and other emerging models, to assess competence and even augment it in the future. “We have to design a sustainable, scalable framework for AI development and implementation, and at the same time embrace cultural transformation,” he said.

The first step to that end is establishing a community of users and developers. “They already exist and they are thirsty to move forward,” Dr. Meireles said. The Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) has helped create and sustain the community by most recently conducting “biomedical data challenges” aimed to advance clinically meaningful use of AI solutions. The SAGES Critical View of Safety Challenge was established as a global benchmark to evaluate how AI can assess a clinically validated safety standard during laparoscopic cholecystectomy. The 2024 challenge was built on 1,000 surgical videos donated by 54 institutions across 24 countries, capturing substantial variation in anatomy, technique, equipment, and clinical context. Thirteen international teams participated, developing and testing AI models on this large, heterogeneous, expert-annotated dataset. The challenge was presented at MICCAI 2024, an international conference on Medical Imaging Computing and Computer Assisted Interventions, and was subsequently invited to continue as a standalone challenge at MICCAI 2025, underscoring its scientific rigor, clinical relevance, and value to the broader surgical AI community. Together, this scale, diversity, and recognition move surgical AI evaluation beyond single-center efforts and toward robust, generalizable, and clinically meaningful benchmarks.

The second step is building an architectural framework capable of connecting and supporting the broader surgical community. As interest in applying machine learning to surgical video analysis accelerated – and in the absence of standardized practices for annotating video data – SAGES convened clinical and technical stakeholders from academia and industry in 2020 to establish consensus recommendations on an annotation framework for surgical video. These recommendations laid the foundation for standardization, noted Dr. Meireles, who served as the Inaugural Chair of the SAGES Artificial Intelligence Committee. In 2022, SAGES extended this effort by developing consensus recommendations on surgical video data use, structure, and exploration, defining how video data should be organized, queried, and responsibly leveraged for AI research, clinical quality improvement, and surgical education. While these efforts established critical building blocks, additional work remains to fully realize a scalable, interoperable architectural framework for the field, he said.

The third step is establishing a legal and regulatory framework to ensure trustworthy, responsible, ethical, and safe development of AI. Data governance involves managing the data during its life cycle, from acquisition to use to disposal. Many stakeholders – health care providers, patients, hospital administrations, certifying boards, medical societies, insurance companies, researchers, and industry – have a role in governing surgical video data. Moving forward will require a multi-functional platform and ecosystem designed to develop technologies that seamlessly integrate data generators (clinicians), developers (scientists), regulators (societies, government, etc.), and end users (clinicians, patients).

These AI models, which will be trained on large datasets, will serve as a fundamental base, Dr. Meireles said. They will be adaptable to a variety of surgical tasks such as video analysis, complication prediction, real-time guidance, and automation. As an example, imagine AI being able to predict future steps in a surgery, based on analyzing past videos. If AI detects a deviation from the norm, it can offer decision support to the surgeon to prevent an error from occurring.

These capabilities have the potential to impact board certification. As surgical cognition is evolving, standards of assessing competence must also evolve, he said, adding, “In the future, expertise won’t just mean what surgeons can recall or do, but how they interact with intelligent systems.” Can they interpret AI suggestions, explain their reasoning when AI disagrees, and learn from an AI review post-op? Dr. Meireles referred to this as “augmented competence.”

“AI will not replace human judgment,” he concluded. “But it will transform what we notice, what we miss, what we reflect on, and how we grow. Board certified surgeons of the future won’t just master anatomy and steps. They will master interfacing with intelligence itself.”

Strengthening bedside medicine skills

Dr Garibaldi

Brian T. Garibaldi, MD, MEHP, FACP, FRCP(E), Director of the Center for Bedside Medicine, the Charles Horace Mayo Professor of Medicine at Northwestern University Feinberg School of Medicine, and Co-Founder of the Society of Bedside Medicine, was not entirely surprised by the 2024 study showing that Google AI has a better bedside manner and makes better diagnoses than human doctors, but he is concerned.

While Dr. Garibaldi acknowledged that AI models can generate an unlimited number of empathetic statements, he was surprised that earlier models were already reasoning at the level of an average physician when given the correct inputs. And they continue to get better at making diagnoses, he said. They can recognize static images almost as accurately as humans can. AI models can even predict frailty and future cardiovascular events in heart failure patients.

While AI tools can help clinicians improve, that doesn’t always happen. In a recent study, experienced endoscopists who used a new AI technology improved their identification of lesions to biopsy during colonoscopy. When the trial ended, their accuracy dropped below pre-trial levels. “In just three months of engaging with this AI tool, they lost the skills that they had acquired across decades,” he said. Becoming increasingly reliant on technology for diagnosis can create doubt in a physician’s mind about what they see, hear, or feel, Dr. Garibaldi said. Thus, AI contributes to automation bias.

AI models can also introduce racial bias. In one study, AI improved photo-based medical diagnoses of skin diseases by physicians, but less so in darker skin, which tends to be underrepresented in textbooks and dermatology residency programs. Most likely, it was due to the data that was input into the AI algorithm, he said.

“We are risking training a generation of physicians who don’t know how to make diagnoses on their own,” Dr. Garibaldi stated. “But we know the data we acquire from talking to and examining patients is fundamental to the decisions that we make.” He has found that trainees spend more than 50 percent of their time in the care of the digital representation of a patient, and only about 13 percent of their time in direct contact with patients. “It’s not surprising that if trainees spend that little time with patients, fundamental skills that can only be practiced and improved in the presence of patients are in decline,” he said.

Studies have shown physicians’ physical exam skills have declined significantly during the last 50 years. Moreover, it’s estimated that more than half of diagnostic errors are related to a mistake in the physical exam, Dr. Garibaldi said. The most common error being simply that the exam was never performed.

Dr. Garibaldi believes that board certification can change the trajectory by better assessing the skills required to perform a physical exam. Per the Internal Medicine patient care milestone, to be ready for independent practice trainees must be able to use advanced maneuvers to elicit subtle findings and use those findings to guide diagnosis and management. In place of direct observation of trainees in Internal Medicine, he said that program directors now use surrogate markers to determine who’s likely to be good at performing at physical exam. “But we know that there is no substitute for direct observation,” Dr. Garibaldi added.

His team has created a formative assessment of the physical exam in which trainees encounter real patients and are observed by real faculty who provide real-time feedback. “If you have good technique and identify the signs, you are much more likely to include the correct diagnosis on your differential,” he said.

An AI tool that can help improve physical exam skills was developed at the Center for Bedside Medicine. Dr. Garibaldi’s team created an app that pulls data from the electronic health record to suggest a differential diagnosis based on the patient’s information and available data. The app then suggests diagnostic skills the trainee can use at the bedside to narrow the diagnosis. It provides links to videos on how to do the suggested maneuvers and help interpret the results.

“There’s more to what we do at the bedside than make a diagnosis,” he said. “There is power in the touch and the time we spend with patients. And that’s something we can drive to change with assessment.”

© 2026 American Board of Medical Specialties

Related Articles

More Articles