WP3 AI Benchmarking and Explainability
The aims of WP3 are:
– To define prediction tasks of high clinical relevance and implement quality dimensions, performance metrics and boundary conditions for AI tool benchmarking, in close consultation with screening equipment manufacturers and clinical stakeholders;
– To collect state-of-the-art AI tools to solve the selected predictions tasks and adapt them to the procured data using transfer learning and domain adaptation techniques;
– To train and benchmark the AI tools with respect to prediction performance, robustness, fairness and uncertainty quantification;
– To formulate formal requirements for explainability, considering inputs from (1) model developers and (2) clinicians;
– To implement post-hoc “explanations” of AI tools, trained on real and synthetic data, and benchmarked with respect to their “explanation performance” on reference data.

