Comparative analysis of supervised integrative methods for multi-omics data
Recent advances in sequencing, mass spectrometry and cytometry technologies have enabled researchers to collect multiple omics data from a single sample. These large datasets have led to a growing consensus that a holistic approach was needed to identify new candidate biomarkers and unveil mechanisms underlying disease aetiology, key to precision medicine. In collaboration with a US partner, this project aimed at landscaping and benchmarking supervised integrative approaches.
Due to the relative novelty of the ﬁeld, numerous challenges remain in multi-omics analysis among which:
- High dimensionality that signiﬁcantly impacts inference
- Data heterogeneity, likely to reduce the biological signal due to heterogeneous biases and systematic errors across platforms;
- Interpretation, where the huge amount of information makes meaningful conclusions diﬃcult to draw
BIOASTER reviewed and selected cutting-edge machine learning methods,representative of the main families of integrative approaches (matrix factorization,multiple kernel methods, ensemble learning and graph-based approaches). Methods were subsequently evaluated on both simulated and real world datasets; the latter were carefully selected to cover various medical applications (infectious diseases, oncology,and vaccine).
Integrative approaches showed comparable or higher performances on simulations and outperformed
non-integrative methods on real-world data. More speciﬁcally, multiple kernel and matrix factorization demonstrated a strong ability to uncover modest eﬀectsin high dimensionality settings.
The expertise acquired in this project will help BIOASTER and its partners reﬁne both biomarker discovery and decipher molecular function underlying new mode of actions.