When Whole Genome Sequencing Meets AI

Posted December 24, 2018 by Bonnibelle

Researchers at Human Longevity published a paper at PNAS. In this research, the company used genome-wide sequencing data to predict individual traits with machine learning methods.

Researchers at Human Longevity published a paper at PNAS. In this research, the company used genome-wide sequencing data to predict individual traits with machine learning methods. The results showed that researchers can more accurately predict some simple individual traits, especially eye color, skin color and gender. The first author of the paper stated that machine learning plays a vital role in scientific discovery, enabling data interpretation work to be fully automated.

Scientists can effectively predict the physical characteristics of our body based on our DNA.

Human Longevity Inc was formed by American genomics scientist Craig Venter and stem cell pioneer Robert Hariri and founder of the XPRIZE Foundation Peter Dimantis, who tried to use the genome and Stem cell therapy, looking for the right treatment and ultimately achieving the goal of delaying aging, and maintaining health and body functions.

The purpose of the entire study was to show how forensic science can use new technologies in its work. Specific to this study, the researchers extracted genome sequencing samples from 1,061 subjects aged 18-82 years and of different ethnicities. The researchers also collected 3D facial images, speech samples, height, weight and other data.

The results showed that using machine learning methods, based on genome-wide data, researchers can more accurately predict some simple individual traits. However, for some complex traits, the prediction accuracy needs to be improved.

The researchers developed a machine learning algorithm called Maximum Entropy and said that if there is more data, the model can produce better predictions (that is, the whole genome sequencing data matches the phenotypic and demographic data).

In the experiment, the machine learning algorithm found a combination of all prediction models. About one-eighth of the participants were successfully re-identified. On the other hand, the success rate of African-American and European participants is “only” 50%. This is not the result of the researchers' hopes.

The authors argued that although this study provides a new approach to forensic science, it also has a serious impact on data privacy, deidentification, and informed consent. Researchers say more and more genomes are being generated and placed in public databases, which requires more public scrutiny. (The study itself has been approved by the IRB).

Craig Venter, co-founder of Human Longevity, pointed out: "We started this research to prove that your genome code has made you. This is obviously a concept prove in the case of limited data. But we believe that as we increase the number of people in this study and the number of people in the HLI database to hundreds of thousands, we will be able to accurately predict everything that an individual's genome can predict."

He added: "We also worry that the public and the entire research community cannot fully appreciate the need for better protection and policy for personal privacy in the genomics era and are urging more analysis, better technical solutions and ongoing discussions. ."

The combination of imaging technology and machine learning can indeed produce some unexpected results. It is yet to be seen that more developments will be seen in the coming years.

About Whole Genome Sequencing

Whole genome sequencing represents the determination of the complete DNA sequence of an organism’s genome at a single time, which entails chromosomal DNA as well as DNA contained in mitochondria and chloroplasts. Whole genome sequencing provides a powerful tool for both de novo sequencing and re-sequencing. De novo sequencing refers to sequencing of a novel genome without reference sequence available.

CD Genomics provides full whole genome sequencing service package including sample standardization, library construction, deep sequencing, raw data quality control, genome assembly, and bioinformatics analysis. We can tailor this pipeline to your research interest. If you have additional requirements or questions, please feel free to contact us.
-- END ---
Share Facebook Twitter
Print Friendly and PDF DisclaimerReport Abuse
Contact Email [email protected]
Issued By https://www.cd-genomics.com/
Business Address 45-1 Ramsey Road, Shirley, NY 11967, USA
Country United States
Categories Biotech , Business
Last Updated December 24, 2018