Using artificial intelligence (AI) and one of the world's fastest supercomputers, Chinese scientists are engineering otherwise unknown chemicals that can be clinically used in the future.
The Tianhe-2 supercomputer in south China's Guangdong Province, ranking among the global top 10 fastest computers in the TOP 500 listing published this month, has been used as a platform for drug discovery. Now, AI-based algorithms make the machine even smarter.
Scientists from Sun Yat-sen University and Beijing-based AI startup Galixir, along with those from the Georgia Institute of Technology and the Massachusetts Institute of Technology, reported a practical deep-learning toolkit to predict the biosynthetic pathways for natural products (NPs) or NP-like compounds in Tianhe-2.
Natural products are the primary source of clinical drug discovery. More than 60 percent of FDA-approved small molecule drugs in the United States are NPs or their derivatives.
Over 300,000 NPs have been recorded to date, but owing to the complex production know-how, only one-tenth have been developed as a substrate or product, with the computer-aided screening urgently needed.
In a recent study published in Nature Communications, the researchers presented a tool called BioNavi-NP to propose NP biosynthetic pathways from simple building blocks in an optimal fashion, which requires no already-known biochemical rules.
Firstly, a single-step bio-retrosynthesis prediction model is trained to generate candidate precursors for a target NP. The full data-driven model achieves a prediction accuracy 1.7 times more precise than the previous rule-based model, according to the study.
Then, an automatic retro-biosynthesis route planning system efficiently samples plausible biosynthetic pathways.
The study reveals that the toolkit can successfully identify biosynthetic pathways for 90.2 percent of 368 test compounds.
Also, the researchers combined an existing enzyme prediction tool to provide a user-friendly, open-to-public web server that can predict biosynthetic pathways. It can also score the biological feasibility of those pathways based on the estimated preference of species and enzymes.
Inputting any relevant NP molecules into the online toolkit, one can obtain multiple predicted ways to synthesize them in a few minutes.
The quick-to-get result is only made possible by Tianhe-2's strong parallel computing capability and its customized GPU resources, which help shorten the training and testing time from more than two weeks to one day.
China's supercomputer Tianhe-2 has been widely used to promote research in health and medicine.
A previous study has reported a cost-efficient tool to discern types of gastric cancer, using Tianhe-2 and an AI-based model called EBVNet.
A gene-screening model on Tianhe-2 can effectively discover signs of nasopharynx cancer among high-risk populations.
Both studies were published in Nature Communications in May and April, respectively.
In March, another study published in the journal Cell Metabolism showed that scientists used Tianhe-2 to find three chemicals that can bring a conceptually new strategy to treat complications of COVID-19.
Chinese scientists also ran the world's first model based on deep learning on Tianhe-2 to non-invasively screen and identify liver and biliary diseases using ocular images.
The findings were published in Lancet Digital Health last year, and this model has already been used in the cloud platform of the Zhongshan Ophthalmic Center under Sun Yat-sen University.