CINECA Virtual Platform

Data Harmonisation

To support human cohort genomic and other omic data discovery and analysis across jurisdictions, basic data such as cohort participants’ demographic data, diseases, medication etc. needs to be harmonised. Individual cohorts are constrained by size, ancestral origins, and geographic boundaries that limit the subgroups, exposures, outcomes, and interactions which can be examined. Combining data across large cohorts to address questions none of them can answer alone enhances the value of each and leverages the enormous investments already made in them to address pressing questions in global health.

CINECA has addressed the meta data representation needs for cohort aggregate and individual data across studies and over time; it has worked on developing a metadata model, on creating a workflow for semantic harmonisation and a system to generate metadata from unstructured dataset and data item descriptions, it has collaborated in the creation of a machine readable consent ontology and on the development of CINECA Synthetic Datasets.