The medical data science group carries out research at the intersection of machine learning and medicine with the ultimate goal of improving diagnosis and treatment outcome to the benefit of the care and wellbeing of patients. As medical and health data is heterogenous and multimodal, our research deals with the advancement of machine learning models and methodologies to address the specific challenges of the medical domain. Specifically, we work in the areas of multimodal data integration, structure detection, and trustworthy (or transparent) models. The challenge lies not only in developing fast, robust and reliable systems but also in systems that are easy to interpret and usable in clinical practice.


Congratulations to Samuel Ruiperez-Campillo on receiving the Best Oral Presentation Award from the European Society of Cardiology at the Digital…

Read more

Congratulations to Ricards Marcinkevics on receiving the 2025 ABB Research Prize, which was presented at the 2025 ETH Day, for his doctoral thesis

Read more

The Department of Computer Science (D-INFK) at ETH Zurich has published a new historical timeline documenting the development of its women’s promotion…

Read more

Abstract

Abstract Foundation models achieve state-of-the-art performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining or fine-tuning, limiting their practicality. Recent findings suggest that deep neural networks exhibit internal representation similarities. While such similarities across different models have been exploited for enabling techniques such as model stitching and merging, intra-network redundancy remains underexplored as a source for efficiency gains. In this paper, we introduce Transformer Optimization using Adaptive and Simple Transformations (TOAST), a framework that exploits these redundancies to approximate entire transformer blocks with lightweight closed-form mappings, such as linear transformations or even the identity function, without any additional training. Across state-of-the-art pretrained vision models (e.g., ViT, DINOv2, DeiT) and datasets ranging from MNIST to ImageNet-1k, TOAST reduces parameters and computation while preserving, and in some cases improving, downstream performance. These results show that large portions of transformer depth can be replaced by trivial functions, opening a new perspective on efficient foundation models.

Authors

Irene Cannistraci, Simone Antonelli, Emanuele Palumbo, Thomas M. Sutter, Emanuele Rodolà, Bastian Rieck, Julia E. Vogt,
denotes shared last authorship

Submitted

Journal Transactions on Machine Learning Research (TMLR)

Date

13.05.2026

LinkCode

Abstract

Multi-domain fine-tuning of large language models requires improving performance on target domains while preserving performance on constrained domains, such as general knowledge, instruction following, or safety evaluations. Existing data mixing strategies rely on fixed heuristics or adaptive rules that cannot explicitly enforce preservation of such capabilities. We propose DynaMiCS, a dynamic mixture optimizer that casts multi-domain fine-tuning as a constrained optimization problem. At each update, DynaMiCS performs short domain-specific probing runs to estimate a slope matrix of local cross-domain effects, capturing how training on each fine-tuning dataset affects each evaluation domain. These estimates are then used to compute mixture weights through optimization over the probability simplex, with the objective of improving target-domain performance while keeping constrained-domain losses below reference levels. Across multi-domain fine-tuning scenarios with varying numbers of target and constrained domains, DynaMiCS achieves stronger target-domain improvements and higher constraint satisfaction than fixed-mixture baselines, at lower computational cost and without reference models, per-example scoring, or manually tuned mixture weights.

Authors

Eleonora Gualdoni, Sonia Laguna, Louis Bethune, Joao Monteiro, Pierre Ablin, Marco Cuturi

Submitted

arxiv

Date

11.05.2026

Link

Abstract

Concept Bottleneck Models (CBMs) are interpretable models that predict the target variable through high-level human-understandable concepts, allowing users to intervene on mispredicted concepts to adjust the final output. While recent work has shown that modeling dependencies between concepts can improve CBM performance, especially under interventions, such approaches typically require retraining the entire model, which may be infeasible when access to the original data or compute is limited. In this paper, we introduce Post-hoc Stochastic Concept Bottleneck Models (PSCBMs), a lightweight method that augments any pre-trained CBM with a multivariate normal distribution over concepts by adding only a small covariance-prediction module, without retraining the backbone model. We propose two training strategies and show on real-world data that PSCBMs consistently match or improve both concept and target accuracy over standard CBMs at test time. Furthermore, we show that due to the modeling of concept dependencies, PSCBMs perform much better than CBMs under interventions, while remaining far more efficient than retraining a similar stochastic model from scratch.

Authors

Wiktor Jan Hoffmann, Sonia Laguna, Moritz Vandenhirtz, Emanuele Palumbo, Julia E Vogt

Submitted

International Conference on Learning Representations (ICLR) Workshop on Trustworthy AI

Date

27.04.2026

Link

Abstract

Machine unlearning aims to remove the influence of specific data from trained models while preserving general utility. Existing approximate unlearning methods often rely on performance-degradation heuristics, such as loss maximization or random labeling. However, these signals can be poorly conditioned, leading to unstable optimization and harming the model's generalization. We argue that unlearning should instead prioritize distributional indistinguishability, aligning the model's behavior on forget data with its behavior on truly unseen data. Motivated by this, we propose Reference-Guided Unlearning (ReGUn), a framework that leverages a disjoint held-out dataset to provide a principled, class-conditioned reference for distillation. We demonstrate across various model architectures, natural image datasets, and varying forget fractions that ReGUn consistently outperforms standard approximate baselines, achieving a superior forgetting-utility trade-off.

Authors

Jonas Mirlach, Sonia Laguna, Julia E Vogt

Submitted

International Conference on Learning Representations (ICLR) Workshop on Agents in the Wild

Date

27.04.2026

Link

Abstract

Background Gait impairment is a hallmark motor deficit of Parkinson’s disease (PD) and represents an important, yet insufficiently understood, target of subthalamic deep brain stimulation (DBS). Although DBS can improve several motor symptoms, identifying robust and physiologically meaningful gait biomarkers that capture both disease-related deficits and stimulation-induced improvements remains a major challenge. In particular, conventional mean-based gait metrics often fail to fully characterize pathological gait or treatment responsiveness. Methods We analyzed 35 spatiotemporal gait parameters obtained during continuous walking from individuals with PD assessed before and after subthalamic DBS, alongside age-matched healthy controls. Multiple machine learning classifiers were evaluated to discriminate between groups, with extreme gradient boosting (XGBoost) achieving the best performance. To enhance interpretability and reduce redundancy among correlated parameters, grouped SHapley Additive exPlanations (SHAP) were applied to rank feature importance and guide feature selection. Results Feature selection consistently highlighted step width variability, step width asymmetry, bilateral interlimb coordination, and the anteroposterior margin of stability as the most discriminative parameters. A compact set of five overlapping features after selection not only reliably distinguished PD gait from healthy controls but also demonstrated a shift toward healthy ranges following DBS. Importantly, these selected features outperformed conventional mean-based metrics in capturing both pathological gait characteristics and treatment-related changes. Discussion Our findings demonstrate that explainable artificial intelligence approaches can identify physiologically grounded gait features that may serve as candidate markers of both PD severity and DBS responsiveness. By emphasizing variability,

Authors

Zhongke Mei, Alain Ryser, Gianluca Amprimo, Jinhao Wang, Julia Vogt, Deepak K Ravi

Submitted

Journal of NeuroEngineering and Rehabilitation

Date

27.04.2026

LinkDOI