Advanced Genome Annotation Pipeline

To provide the up-to-date information and best annotation experience, DNArails had developed our genome annotation pipeline including two parts: comprehensive data integration and annotation speed accelerate.

In the first part, we had collected over 50 databases, including gene basic information, cancer-related, disease-related, allele frequency, and allele frequency, to help our users interpret their data from many different perspectives. All databases are provided with two different human reference genome version (hg19 & hg38) to fit all kind of experimental design. We will update our integrated databases regularly to ensure the data accuracy. We also remain the flexibility to add the private databases into the annotation flow.

The second part is the self-developed data mining algorithm which can provide the high-efficiency annotation experience. Combining with Edico's solution for alignment and variant calling, the analysis of WES data can be completed within half hour.

AI-Driven Data Power

Since the advence of genetic research, there are more than 4,000 genes known to be associated with hereditary diseases worldwide. However, there are still many diseases that cannot diagnose because of the lack of relevant research or clinical evidence. Therefore, DNArails not only provides a known database but also develops a protein loss-of-function prediction model, called Dr. Score, with AI technology. The comprehensive evaluation from evolutionary conservation, protein structure, and sequence homology, the predictive models will give a score for each mutation based on the pathogenic degree of nonsynonymous variants. Used two independent test datasets (PMC4375422) to evaluate the accuracy of this model, and the results show that Dr. Score outperforms (ACC = 90.8%) existing methods in predicting pathogenic.

Based on this experience, DNArails will not only provide customers with an excellent analytical system but also cooperate with in-depth research and development. In the future, the key to NGS development will be the accumulation of genome databases and the ability to mine the significance for clinical applications. We will assist our clients in transforming from a labor-intensive service model, using their accumulated data results to build an in-house analysis model and strengthen competitive ability.

Genomic Data Visualization

Transforming complex data into visualized information can help users quickly get the hang of the system operation or figure out the deeper meaning in their data.

All self-developed systems are all designed with the graphic operating interface for sample management, analysis kick-off, and result interpretation. The presented analysis results can divide into two different type according to the user's analysis behavior. The first kind of users would like to make deeper interpretation by themselves. For this kind of user, the presented results will be the primary annotation results. The users can set up their in-house filtering parameter for the specific testing purpose. For the users looking for the clinical solution, the system will statistically and categorize all the results to present the characteristics of the mutations in the sample. In the second stage, we will focus on the association between the collected evidence and shows the information with clinically significant.

In the second condition, we also design analysis reports to fit the clinical needs. The system will automatically generate the formatted report with the interpretation based on the analysis results and the health education information. It can reduce the cost for users in explaining the report to their customers.

Guideline Implemented Analysis Rule

DNArails is well aware of how the generated data will influence the clinical interpretation. Therefore, to ensurw the correctness of analysis, we had developed our product with the guideline submitted by the government or genome-related association in the U.S or Europe. The most representative solutions are NGS QC system, and ACMG-AMP guideline implemented the system.

The NGS QC system, this in the meantime and have the flexibility of criteria setting function and mandatory item checking function, is designed for monitoring the condition of genomic data. It can help the users to monitor the stability of data quality and to record all information mentioned by the guidelines.

The ACMG-AMP guideline implemented the system is designed for clinical data interpretation. The system will automatically classify the variants into five categories with the 28 criteria provided by ACMG-AMP guideline. All data used in interpretation are collected from open databases, and users can also add their in-house knowledge to raising the sensitivity. According to the result from our internal dataset, the correctness of the automated classification algorithm is more than 95%.