The following deliverables and reports summarise the work performed by Solve-RD partners.
Results
D1.3 Training modules, guidance document and online help module for collection of phenotypes
In order to engage users with Solve-RD and facilitate their contribution of phenotypic data to the project, we have developed training materials and have carried out training activities, which will continue in time. We have set up a practical guide to PhenoTips data entry, provided an in-depth YouTube video with instructions, organized our first training webinar for ERNs (available here) and created an option for uploading phenotype data using an Excel template. In addition, further support to researchers is provided by one-to-one tele-conferences for PhenoTips usage and by answering any queries they have through help@rd-connect.eu.
D1.4 Deployment of PhenoTips custom forms according to the ERNs specifications
This deliverable report describes the adaptations and customisations CNAG-CRG have carried out in PhenoTips to allow for the collation of phenotype data tailored to each ERN and disease group. Users can enter data individually for each proband and family member through one of the customized templates, import information using JSON schemas or provide a filled-in Excel template for bulk upload.
D1.5 Guidelines for collection of experimental data
This deliverable report provides information on the guidelines that have been developed for experimental data collection for Solve-RD. These guidelines aim to aid and facilitate all data upload from all Solve-RD collaborators and ensure high quality standards are met.
D1.10 Adaptation of BOQA algorithm to its use in the ontology of unsolved RD
WP1 not only aims to collect standardized phenotypic information from unsolved rare disease (RD) cases, but also aims to transform their phenotypic descriptions into diagnostic hypotheses. One of the proposed means to produce these diagnostic hypotheses is calculating a numerical similarity value that reflects how well the phenotype information of an unsolved case aligns with other solved or unsolved cases as well as how well it overlaps with known rare diseases.
In this report we present a software tool we developed and that adapts several algorithms to calculate the degree of similarity between unsolved cases and known diseases using solved cases. The results of this tool will be imported directly into the ontology of rare unsolved cases (RDCO).
We have tested the tool and implemented algorithms on 107 PhenoPackets obtained from solved cases. We identified two algorithms that show superior performance on this relatively small test set and were able to run this tool on Orphanet computers.
D1.12 4 workshops, videoconferences and jamborees for hands-on discussion on diagnostic hypothesis
A series of 4 jamborees was organised in order to discuss the results of a proposed methodology based on phenotypic similarity calculations and reanalysis of genomic information provided by ERNs’ clinicians on GPAP. Jamborees were organised by selecting a solved case presented by the data submitter and further discussed with regard to the consistency with a known rare disease (RD), with the ultimate goal to give a clinical diagnosis or, if inconsistent, to consider the emergence of a new RD by clustering with similar cases. In addition, a cascade of similarity calculations allowed for reanalysing unsolved cases for candidate genes and discussing these cases with the clinicians. Even if an estimation of the overall performance of this approach cannot be reported, it allowed to correctly identify variants in solved cases, and to raise new hypotheses deserving further investigation in at least 7 unsolved cases, and other cases are still to be discussed with the clinicians as there was not enough time to present them all. Out of these promising results, the methodology was validated and next steps generalising and standardising the method were decided. Selected unsolved and solved but yet clinically undiagnosed cases will be the object of the next series of jamborees.
D2.2 Guidelines for exome/genome re-analysis
Over 5,000 genomic datasets have been collected by Solve-RD, processed using a standardized analysis pipeline (Laurie et al., 2016) and made available to all Solve-RD partners through the RD-Connect GPAP and the Solve-RD Sandbox.
To organise the re-analysis and interpretation of the data we have pulled together the consortium’s bioinformatics and analysis expertise in a Data Analysis Task Force (DATF) and the clinical and biological expertise in 4 Data Interpretation Task Forces (DITFs), one per core ERN.
Solve-RD data analysis task force (DATF) has managed to setup a semi-automatized workflow for genomic data (re)analysis and interpretation involving partners and expertise from the dif-ferent WPs and ERNs. Different working groups and use cases have been set up and largely documented to provide guidance to Solve-RD partners on how the (re)analysis of their data will be performed and which are the steps required for validation and feedback.
D2.3 Guidelines for Quality Control metrics
Solve-RD has collated over 8,400 standardised phenotypic and genomic datasets from partners across different ERNs and countries. To ensure best practices and standardisation of the process, HPO, OMIM and Orphanet (ORDO) ontologies are used to collect phenotypic data, and GATK best practices and GA4GH standards are followed in the collection and processing of genomic data through a standardised pipeline (Laurie et al., 2016 here).
As data submitted to Solve-RD for reanalysis has been sequenced at a variety of different centres, under different protocols and using different technologies it is fundamental to ensure a minimum quality of geno-pheno datasets to guarantee proper downstream analyses and results. Therefore, several quality control metrics have been established, and are now automatically performed for each of the samples entering the Solve-RD project.
Here we report on the established framework for quality assessment (processing checkpoints, genome coverage metrics, phenotypic data and sample relatedness) for RD-Connect and Solve-RD data and provide guidelines to enable Solve-RD data submitters and Data Analysis Task Force (DATF) members to easily assess the quality of the provided data and compare genomic datasets before undertaking further downstream analyses.
D2.5 Report on new matchmaking strategies
Matchmaking technologies cover a very specific data findability need by enabling the discovery of individuals with similar phenotypes and/or potential causative genetic variants, among others. Solve-RD has contributed to the development and implementation of several matchmaking strategies which are now available to the consortium but can also be deployed in other settings. The RD-Connect Genome Phenome Analysis Platform (GPAP) is connected to the MatchMaker Exchange network (MME), which enables users, including all Solve-RD partners, to query for individuals with similar symptoms and/or candidate disease genes within the GPAP (internal matchmaking), and against four external databases: PhenomeCentral, DECIPHER, GeneMatcher, and MyGene2. The PhenoStore module allows the users to find individuals with specific categories within the RD-Connect GPAP. The CohortApp module enables the users to create in-silico cohorts according to certain search criteria; such cohorts can then be used to launch genetic queries within the RD-Connect GPAP analysis module. Finally, powerful search, discovery and matchmaking capabilities are provided through RD3/Sandbox and Discovery Nexus.
D2.23 Brokerage service for 50 newly identified genes
In biomedical research and in particular for rare diseases, there is a critical need for model organisms aiming to unravel the mechanistic basis of diseases, to perform biomarker studies, and to develop potential therapeutic interventions. In Solve-RD we have established the Euro-pean Rare Disease Models & Mechanisms Network (RDMM-Europe) to promote fruitful col-laborations between clinicians and model organism experts. The main principle of this broker-age service is to fill the gap between RD gene discovery and functional validation of potentially new disease genes and/or novel disease mechanisms. For this purpose, RDMM-Europe cat-alyzes the connection of Solve-RD clinicians and scientists, who have discovered new dis-ease-causing genes, with Model Organism Investigators (MOI), who are experts for the given genes, the proposed model organism and/or cell culture systems. Solve-RD provided Seeding Grants for selected validation projects to be conducted by researchers outside the consortium across the world.
Parts of report are based on the publication:
Ellwanger, K., Brill, J.A., de Boer, E. et al. Model matchmaking via the Solve-RD Rare Disease Models & Mechanisms Network (RDMM-Europe). Lab Anim (2024). https://doi.org/10.1038/s41684-024-01395-2
D3.1 Publication: EBCD findings in at least 2 different HCPs including in one ERN
The communication of genomic results to patients and families with rare diseases raise distinctive challenges. However, there is little evidence about optimal methods to communicate results to this group of service users. To address this gap, we worked with rare disease families and health professionals from two genetic/genomic services, one in the United Kingdom and one in the Czech Republic, to co-design that best meet their needs. Using the participatory methodology of Experience-Based Co-Design (EBCD), we conducted observations of clinical appointments (n=49) and interviews with family participants (n=23) and health professionals
(n=22) to gather their experience of sharing/receiving results. The findings informed a facilitated co-design process, comprising 3 feedback events at each site and a series of meetings and remote consultations. Participants identified a total of four areas of current service models in need of improvement, and co-designed six prototypes of quality improvement
interventions. The main finding was the identification of post-test care as the shared priority for improvement for both health professionals and families at the two sites. Our findings indicate the need to strengthen the link between diagnostics (whether or not a pathogenic variant is found) and post-test care, including psychosocial and community support. This raises
implications for the reconfigurations of genomic service models, the redefinition of professional roles and responsibilities and the involvement of rare disease patients and families in health care research.
D3.2 Publication: Synthesis of existing studies assessing cost effectiveness and clinical utility of WES/WGS
This report is the deliverable of Solve-RD WP3, task 3, objective A “Perform a systematic review to gain insight into currently ongoing clinical utility studies for genomics strategies and their conclusions”. It has been led by Christine Peyron and Aurore Pélissier from the Health Economics Team of the Laboratoire d’Economie de Dijon (University of Burgundy).
The report sets out: (i) the work done to organise the conference proposed in Objective A; (ii) the summary of the research presented at the conference.
In addition to reporting on the work by the Dijon Health Economics Team, it provides a fairly complete overview of the issues and research currently being developed in the social sciences with respect to genomic medicine.
D3.3 Diagnostic yield per RD per ERN for all omics used in WP2
In the Solve-RD project, seven different -omics technologies have been used to increase diagnostic yield for individuals with a genetically undiagnosed rare disease. These include deep exomes, short-read genomes, long-read genomes and optical genome mapping, transcriptomics, epigenomics, and metabolomics together tailored to not only detect the primary genetic underlying variant, but also to assess functional impact thereof. For each -omics, the ERNs have contributed multiple RD cohorts to unveil the diagnostic potential. In this task, we aimed to determine these yields per RD and per ERN across the omics used.
In total, >3,500 new datasets were generated across the ERNs and -omics, varying between 89 for optical genome mapping and 1,859 short read genomes. So far, diagnostic yield can mostly be obtained per technology across ERNs as interpretation is still ongoing. Highest diagnostic yield has been obtained for short read and long read genome sequencing and optical genome mapping, at 8%, 11% and 12%, respectively. Yet, when also taken into account ERN specific subcohorts, transcriptomics for individuals with clinical characteristics reminiscent of a metabolic disease (ERN-ITHACA, RND and Euro-NMD) and amplicon-based long reads sequencing for Lynch(-like) syndrome and transcriptomics (ERN-Genturis) provide most new diagnosis (at 16% and 19%, respectively). Interestingly, for few cases, it has been demonstrated that a combination of multiple omics was need to find a diagnosis, underscoring the complexity of RD diagnosis.
In conclusion, each novel omics technology for which data have been interpreted contributed to the new diagnoses of individuals with genetically unexplained rare disease. With analysis and interpretation efforts still ongoing, it remains to be established which diagnostic test serves which patient cohort, or ERN best.
D3.4 Publication on optimal pathway to obtain genetic diagnosis for new RD patients
The diagnostic trajectory for individuals with a genetic rare disease often still contains consec-utive testing, in which for instance exome sequencing is supplemented by complementary tar-geted assays to overcome technical challenges from the NGS-based assay. Despite this strat-egy, still 40-60% of individuals remain genetically undiagnosed, thus questioning whether this is the best strategy to diagnose all. With technical advances still ongoing, making genomes as diagnostic test feasible, and increasing knowledge on non-coding variant interpretation, provid-ing a basis for the use of genomes in clinic, we are at a cross road to evaluate which genome strategy would be best.
In this task, we first evaluated the potential for short read genome sequencing to serve as first-tier diagnostic test for all (germline-based) genetic rare disease (denominated Phase I). By assessing a series of 1000 samples with known clinically relevant variants, it was uncovered that >95% of all variants were identifiable from 30x short read GS (Illumina platform). The 5% remaining variants were not identifiable from short read genomes. The type of variants is rele-vant for 29% of referrals to this diagnostic laboratory, suggesting that for this centre, short reads are a useful first-tier test for a majority of all rare disease referrals, but not all. In Phase II we subsequently targeted the 5% of failed variants by long read genome sequencing. In a pilot study, including 100 samples, it was noted that PacBio HiFi long read genomes were able to detect >98% of these variants, thus providing higher potential as first-tier test for rare dis-ease than short reads.
From the results obtained in phase I and II, it would be recommended to implement long read sequencing as first tier test for individuals with rare disease. Whereas health economic evalu-ation still remained to be performed to determine socio-economic feasibility, this assay is able to replace all routine germline-based workflows, thus yielding the maximum diagnostic yield in a single test. Moreover, with increasing knowledge on interpretation of non-protein coding var-iants, long read sequencing also provides an ultimate opportunity to enhance diagnostic yield beyond todays diagnoses.
D3.5 Treatabolome database
The Treatabolome: flagging treatable genes and variants. The database will be connected to the RD-Connect GPAP platform and made accessible as part of the real-time analysis of patients undergoing sequencing or exome analysis within Solve-RD as a proof of concept for the utility of the approach.
D3.6 Publication on strategy for cohort development in omics studies
Solve-RD aims to find a diagnosis for rare disease patients who did not get a molecular diag-nosis yet. In addition to research re-analysis of existing exome and genome data, we used latest -omics technologies isolated or in combination in bespoke cohorts to solve the unsolved diseases and discover the underlying disease mechanisms.
To this end, bespoke rare disease cohorts provided by the European Reference Networks (ERNs) were analysed by novel (multi) omics tools that go beyond the exome/genome.
This report describes the applied Solve-RD strategy for cohort development in omics studies.
D3.7 1-day workshop for industry
This deliverable report describes the organization of two events involving participants from industry to transfer the strategies and achievements from the Solve-RD project.
D3.8 First summer school for ePAGs delivered
This deliverable report covers the first edition of the EURORDIS Winter School, the capacity building training programme for rare disease patient representatives on scientific innovation and translational research, which was held at the Imagine Institute for Genetic Diseases in Paris from 19-23 March 2018.
D3.9 Second training for ePAGs delivered
This deliverable report covers the second edition of the EURORDIS Winter School, the capacity building training programme for rare disease patient representatives on scientific innovation and translational research, which was held at the Imagine Institute for Genetic Diseases in Paris, France from 11-15 March 2019.
D4.3 Central RD-Connect database serving Solve-RD, including user authentication and authorization
Solve-RD will employ the RD-Connect Genome-phenome Analysis Platform (GPAP) to pool and enable controlled access to a large number of harmonised and integrated datasets from unsolved rare disease cases. This deliverable report describes the actions taken to ensure GPAP is serving Solve-RD, including steps for user authentication and authorization and how researchers can share their data.
D4.5 Metadata catalog operational, with initial content
In order for the resources that are contributed to and will be developed during the Solve-RD project a discovery system was required. Such a system is being developed based on proven technologies and a suitable set of standards. This deliverable focuses on the building of an initial version of the system, based on the Café Variome platform, for asset discovery called RD-NEXUS (Rare Disease Network for EXploring the UNseen). In order to build the system, a working data model for an agreed set of parameters (termed 'findable facets’) was defined and integrated in to a data model. APIs allowing interoperability with other systems were also developed in collaboration with the GA4GH. Exemplar data have been processed and entered into the current RD-NEXUS system to illustrate its functionality and highlight any potential issues, before being demonstrated to potential users within the ERN networks. A complete first version of RD-NEXUS was thereby created, and is now available for demonstration and testing.
D4.7 All foundational standards selected and implemented across the project
Underpinning all activities at Solve-RD, from data submission, quality control, data dissemination to appropriate resources, discovery and finally distribution, a set of defined standards are required to ensure smooth and efficient data flow and interoperability between resources within Solve-RD and external resources. This deliverable focuses on defining and implementing the set of standards required to facilitate these processes, and here we describe how these standards have been established and implemented across Solve-RD.
D4.8 Complete Solve-RD bioinformatics platform operational
As a cornerstone for the success of Solve-RD in terms of solving unsolved rare disease cases, a critical objective is to reuse, enhance and deploy existing solutions for core analytics support, databasing, data discovery, and data sharing. Here we describe the overall Solve-RD data workflow, and how the different components of the Solve-RD bioinformatics platform connect to facilitate the process of solving unsolved cases. The workflow includes data submission, management, analysis and interpretation processes in the RD-Connect Genome-Phenome Analysis Platform (GPAP); raw and processed data archiving at the European Genome-phenome Archive (EGA, Hinxton, UK); data analysis and directory structures; and the functionalities of the Rare Disease Data about Data database (RD3) and Discovery Nexus.
D5.1 Bespoke Phenotips frontends for associated ERNsand undiagnosed disease programmes
This report provides information on the new clinical submission forms designed and implemented in GPAP-PhenoStore, the phenotypic module of the RD-Connect Genome-Phenome Analysis Platform (GPAP). These templates have been designed in collaboration with clinical experts from Solve-RD WP1 and in alignment with Genomics England data models. This implementation facilitates the collation and future portability of structured clinical information of unsolved patients from associated ERNs and undiagnosed disease programs.
D5.2 3.500 collected data sets from associated ERNs and undiagnosed disease programmes
Solve-RD has four core European Reference Networks (ERNs): ITHACA, EURO-NMD, RND and GENTURIS. The core ERNs have provided the bulk of data for re-analysis within Solve-RD. Solve-RD has also worked since its conception with the Undiagnosed Disease Programmes/Networks (UDPs/UDNs) from Spain and Italy. During the project, two ERNs have become associated with Solve-RD: EpiCare and RITA. 2,932 datasets in total have been provided by UDN-Spain, ERN-EpiCare, ERN-RITA and other ERNs as part of data freezes 1 to 3. This data has already been processed and is available to the consortium members. Further data is still being submitted as part of data freeze 4.
D5.3 Unsolved cases from associated ERNs and undiagnosed disease programmes analysed through RD-Connect
At the clinical core of the Solve-RD project, there are four European Reference Networks (ERNs) that contributed 1000s of unsolved rare disease cohorts. During the implementation period of the project, two further ERNs have joined the Solve-RD network as associated ERNs: ERN EpiCare which focuses on Rare and Complex Epilepsies, and ERN RITA which focuses on Rare Immunodeficiency, Autoinflammatory and Autoimmune diseases. Together they have submitted 2,273 new datasets for re-analysis, which have been processed through the standard analysis workflow in the RD-Connect GPAP (GPAP), and secondary analyses at CNAG, EKUT, and RadboudUMC, within the DATF Working Groups.
D5.4 Guidelines for molecular genetics of rare disorders
New diagnostic whole genome sequencing (WGS) guidelines for rare genetic disorders have been prepared and published by Erika Souche et al. 2022 in the EJHG see this link.
D6.4 Solve-RD communication and dissemination tools
This deliverable report includes the Solve-RD communication and dissemination plans and describes the tools Solve-RD uses to implement both.