Posts by Leslie Glass
Semantic and harmonisation best practice - D3.2

Authors - Melanie Courtot (EMBL-EBI), Isuru Liyanage (EMBL-EBI)

To support human cohort genomic and other omic data discovery and analysis across jurisdictions, basic data such as cohort participants’ demographic data, diseases, medication etc. (termed “minimal metadata”) needs to be harmonised. Individual cohorts are constrained by size, ancestral origins, and geographic boundaries that limit the subgroups, exposures, outcomes, and interactions which can be examined. Combining data across large cohorts to address questions none of them can answer alone enhances the value of each and leverages the enormous investments already made in them to address pressing questions in global health. By capturing genomic, epidemiological, clinical and environmental data from genetically and environmentally diverse populations, including populations that are traditionally under-represented, we will be able to capture novel factors associated with health and disease that are applicable to both individuals and communities globally.

We provide best practices for cohort metadata harmonisation, using the semantic platform we deployed in the cloud to enable cohort owners to map their data and harmonise against the GECKO (GEnomics Cohorts Knowledge Ontology) we developed. GECKO is derived from the CINECA minimal metadata model of the basic set of attributes that should be recorded with all cohorts and is critical to aid initial querying across jurisdictions for suitable dataset discovery. We describe how this minimal metadata model was formalised using modern semantic standards, making it interoperable with external efforts and machine readable. Furthermore, we present how those practices were successfully used at scale, both within CINECA for data discovery in WP1 and in the synthetic datasets constructed by WP3, and outside of CINECA such as in the International HundredK+ Cohorts Consortium (IHCC) and the Davos Alzheimer’s Collaborative (DAC). Finally, we highlight ongoing work for alignment with other efforts in the community and future opportunities.

https://doi.org/10.5281/zenodo.5055308

Read More
Framework and APIs for executing federated genomics analyses D4.2

Authors - Álvaro González (CSC), Shubham Kapoor (CSC), Kirill Tsukanov (EMBL-EBI)

The federated analysis platform defined by this task aims to provide technological solutions for three exemplar use cases: Federated joint cohort genotyping; Polygenic Risk Scores (PRS) workflow across two similar ethnic background sample sets; Federated QTL analysis for molecular phenotypes. In this deliverable, we gathered the technical requirements based on these use case descriptions and wrote a short design document which explains the requirements and lists the different options for a solution.

Three distinct frameworks were considered to address the requirements from the use-cases. The chosen framework supports different computing environments, which is a requirement for true federated analysis. The framework also supports extending compatibility with GA4GH standards, such as WES, htsget, and AAI / Passports. Plans to extend this proposed solution beyond these initial sites will be carried out after the initial phase of validation.

https://doi.org/10.5281/zenodo.4609356

Read More
DeliverablesLeslie Glasswp4, WP4
Query expansion service - D1.2

Authors - Romain Tanzer (HES-SO), Nona Naderi (HES-SO), Douglas Teodoro (HES-SO), Anais Mottaz (HES-SO), Patrick Ruch (HES-SO), Jonathan Dursi (SickKids), Jordi Rambla de Argila (CRG)

CINECA aims to support federated queries and analyses of distributed cohorts across continents. But human health datasets are extremely diverse; many different types of data are collected for many different kinds of health studies by many different health research communities. As a result, different cohort datasets often use different ontologies to describe similar kinds of entities, or represent concepts, such as genomic variation differently.

CINECA must span this diversity of data representations in order to achieve its goals of connecting health research cohort data. The work of WP3 partially addresses discoverability of datasets by defining a standard minimal cohort-level data representation which will be common across all cohorts; but that does not address cohort-level data that falls outside of the minimal common data model, nor does it address the representation of patient-level data. WP1’s role is to design and deploy API access to both cohort- and patient-level data, and a fundamental functionality of the infrastructure is to allow the user to find the appropriate dataset independently of the ontology used to map locally the different cohorts or indifferently of the format and syntax used to describe the variants.

This report describes the work done on query expansion, by implementing and demonstrating a query expansion service API that improves findability and searchability of distributed cohort data. Multiple kinds of query expansions are available for enabling further data integration and interoperability, including horizontal expansion, i.e., across ontological systems, and vertical expansion, i.e., within sublevels of the same ontological resource.

https://doi.org/10.5281/zenodo.4609335

Read More
DeliverablesLeslie GlassWP1
Cohort minimal metadata model - D3.1

Authors - Vivian Jin, Fiona Brinkman (SFU)

To support human cohort genomic and other “omic” data discovery and analysis across jurisdictions, basic data such as cohort participant age, sex, etc needs to be harmonised. Developing a key “minimal metadata model” of these basic attributes which should be recorded with all cohorts is critical to aid initial querying across jurisdictions for suitable dataset discovery. We describe here the creation of a minimal metadata model, the specific methods used to create the minimal metadata model, and this model’s utility and impact.

A first version of the metadata model was built based on a review of Maelstrom research data standards and a manual survey of cohort data dictionaries, which identified and incorporated overlapping core variables across CINECA cohorts. The model was then converted to Genomics Cohorts Knowledge Ontology (GECKO) format and further expanded with additional terms. The minimal metadata model is being made broadly available to aid any project or projects, including those outside of CINECA interested in facilitating cross-jurisdictional data discovery and analysis.


https://doi.org/10.5281/zenodo.4575460

Read More
Catalogue of Canadian, European and African ethical and legal gaps - D7.2

Authors - Éloïse Gennet, Melanie Goisauf, Delphine Pichereau, Emmanuelle Rial-Sebbag

Remaining liberties that GDPR provides to EU Member States, as well as remaining ambiguities on GDPR interpretation, continue to feed debates in the ethical and legal literature. Projects like CINECA, which is seeking to facilitate health data exchanges between cohorts in Europe, Canada and Africa, offer valuable experience and input on essential ethical and legal gaps between countries and cohorts on questions such as the ethical lawful basis for international health data sharing and secondary processing for research purposes.

The focus of this deliverable will be on answering, both from a legal and an ethical point of view, two priority questions: How to choose a legal basis for CINECA’s data processing? And how should CINECA apprehend broad consent to further data processing? The goal will be to study how the CINECA project could be efficiently conducted (especially data sharing) while being legally compliant with relevant laws and regulations across all member states, and most of all, being compliant with established ethical guidelines and practices across three continents.

https://doi.org/10.5281/zenodo.4298450

Read More
DeliverablesLeslie Glasswp4
CanDIG and ELIXIR AAI interoperability demonstration - D2.1

This deliverable demonstrates authentication and authorisation interoperability between the ELIXIR and CanDIG infrastructures. Users from one infrastructure can access services from the other. The interoperability covers user identification and authentication as well as the transfer of the authorisation claims following the GA4GH Passport and AAI OpenID Connect protocol (OIDC) profile specifications.

https://doi.org/10.5281/zenodo.3938916

Read More
Discovery Service Catalog - D1.1

CINECA aims to support the federated queries and analyses of distributed cohorts across continents. A vital component of this work is building a machine readable catalogue of cohorts and sites that support the efforts of Work Package 1 discovery and analysis APIs, which can be programmatically queried so that API calls can be made to relevant sites and results gathered and presented to the researcher.

Deliverable D1.1, Discovery Service Catalogue, supports the work of dependent work packages by implementing and demonstrating an open-source extended implementation of the Service Registry standard of the Global Alliance for Genomics and Health (GA4GH) for WP1’s discovery queries, the GA4GH Beacon queries. The Service Registry standard is now supported by the ELIXIR Beacon Network that CINECA WP1 uses to federate discovery queries across cohorts, and this demonstrator deliverable demonstrates the use of the service registry and its open source implementation.

https://doi.org/10.5281/zenodo.3908397

Read More