TECHNOLOGY UNIT : DATA MANAGEMENT AND DATA SCIENCE

Performance of the infrastructure, originality of the cross-cutting data management approaches and innovation brought to high dimensional data analytics for the benefit of translational bioinformatics.

Analyse des données

CHALLENGES

The huge amounts of data generated by high-throughput experiments, NGS, multi-omics analysis or imaging did move life sciences in the era of big data. In return, they have contributed to the emergence of a new discipline: the “Data Science” who must deal with the complexity of organization of biological systems.

The data thus produced are very disparate (DNA sequences, metabolic profiles, immunological biomarkers, metadata associated to clinical trials…) which imposes to redefine processes and methods governing their storage, analysis and visualization. Last challenge consists in successfully perform the integration and the transversal analysis of all these data to translate them into actionable information and then, to give back biological outcomes to interdisciplinary teams.

The management and analysis of data from this new systemic and integrative approach require the deployment of scalable, flexible, and high-performance computing resources. These features are similar to the infrastructure needs in storage, calculation and transfer of radically different industrial sectors (e.g. ‘Tech giants’) who exploit the flexibility of virtualization and Cloud-based services architecture.

OBJECTIVES

The “Data Management & Analysis“ Technology Unit performs data collection and analysis, crosses, secures and shares data originating from the different BIOASTER’s Technological Units and from collaborative projects carried out with partners. The Unit thus allows scientists to convert raw data into information and knowledge while taking full advantage of computing power, mass storage capabilities and broadband networks made available at CC-IN2P3 ( http://cc.in2p3.fr ), a strategic partner of BIOASTER.

To fulfil its tasks and propose an original approach to scale up the value-chain of data management and analysis pipelines, the Technology Unit is structured in 3 thematic clusters supported by components of the main scientific information backbone (or “Core Services”) :

Topic 1: Data Management

Business needs (data- & user-oriented), Ergonomics & Support

Provision of a portfolio of Bio-IT solutions, collaborative platforms and repositories of scientific data to support BIOASTER and partners projects, made available as Cloud-based Services.

This topic embeds the establishment of :

  • Platforms for collection of omics, phenotypic or clinical data (LIMS, eCRF…) as well as metadata originating from biological resources inventory (BIOSPECIMENS https://biospecimens.bioaster.org )
  • Automated analytical and visualization solutions (workflow management, e.g. Galaxy) deployed on our computing infrastructure
  • Web-services to browse and visualize clinical studies enriched with integrated multi-omics data (tranSMART / eTRIKS).

Topic 2: Cloud-based Computing Management

Performance & High-availability

Deploying a scalable and high performance Cloud infrastructure dedicated to the management and multi-dimensional analysis of massive, heterogeneous and potentially sensitive data. 

With regard to systems administration and maintenance of running conditions, we rely on the resources and expertise of both CC-IN2P3 and BIOASTER’s IS/IT Department.

Topic 3: Knowledge Management

Innovation & Transversal approach

Establishment of a Knowledge Management System dedicated to Translational Analytics & Scientific Support to the projects.

The last ambition of the Technology Unit consists in designing a proprietary and innovative strategy capable to emphasize the uniqueness of BIOASTER, i.e. the incorporation of clinical, omics and phenotypic data to elucidate consistently questioned interactions between patient, animal model, pathogen microorganism, microbiota, disease and/or the drug. This strategy should materialize as a central Integrated Knowledge Management System ready to take up some translational sciences challenges, depending on the line of inquiry: discover new biomarkers, decipher mechanisms of action or make some in vitro / in vivo models more complete, efficient or predictive.

The Unit aims also to support and perpetuate the work emanating from the fellowships of bioinformaticians, bio-mathematicians /-modelers and (bio-)statisticians distributed within other Technology Units and working as experts pool who serves the most early and exploratory projects activities.

Core Services

Services/Resources Pooling & Interoperability – Sustainability & Security

The scope of these activities consists in developing central or shared services (e.g. identification/authentication, Data Life Cycle Management System, reliable transfer of massive files…) as well as reusable components in order to facilitate the exchange of data (e.g. ETL, Web-services,…), to build interfaces or to inter-operate our solutions and platforms in a transparent and secure way for users.

ADDED VALUE

Expertise, Equipments, Technologies

A strategic partner : The CC-IN2P3

To reach targeted performance criteria and expected level of satisfaction about storage and analysis capabilities, the Technology Unit relies on the Computing Center of the National Institute of Nuclear Physics and Particle Physics, CNRS (CC-IN2P3) that provides BIOASTER’s R&D programs with high-throughput computing, mass storage capacity and large bandwidth networks usually destined to subatomic physics and astrophysics.

Designed from the outset to be scalable and flexible, this architecture can quickly adapt and adjust depending on current and future BIOASTER project needs.

The Technology Unit brings together an unusual panel of expertise and multidisciplinary skills working all along the value chain leading raw data to refined data and biological knowledge to actionable information.

Thus, present and future skills of women and men represented in the Technology Unit spread over Bio-IT and Data Sciences eras :

  • Bioinformatics-Statistics : modeling, biological reference database, open-source tools, principles of curation and annotation…
  • Applications & platforms – Design : Services Architecture ; reusable software components
  • Software Engineering – Delivery : software development, assembling/packaging, modeling, Web-app (frameworks)…
  • Administration & Running : deployment and continuous integration tools (DevOps reference framework), monitoring, KPIs…
  • Background : mixed OS, virtualization, Cloud, HPC…
  • Methodologies and standards – Regulatory or legal constraints – Quality norms and guidelines
Schema Utec6

Network & Partnership

Logo Bcom
Logo Ccin2p3
Logo CNRS
Logo Etriks
Logo Institut Pasteur
Logo Transmart
Logo Vmware

Highlights & News

  • 3 open positions for software engineers (Join us)

Upcoming Events