At the end of last year, Google DeepMind launched an algorithm called AlphaFold (a system that uses artificial intelligence to accelerate scientific discovery, which is based on the gene sequence of a protein to predict the 3D structure of a protein).
Today, in the International Protein Structure Prediction Competition (CASP) called the "Protein Olympiad", AlphaFold defeated the rest of the participants and was able to accurately predict the 3D structure of proteins based on amino acid sequences.
Its accuracy is comparable to 3D structures resolved using experimental techniques such as cryo-electron microscopy (CryoEM), nuclear magnetic resonance, or X-ray crystallography.
A major breakthrough in protein folding in the past 50 years, AI solves the prediction problem
The shape of a protein is closely related to its function, and the ability to predict the structure of a protein can help us better understand the function and working principle of a protein. Many major challenges in the world, such as developing treatments for diseases or finding enzymes that break down industrial waste, are fundamentally related to proteins and their roles.
Traditionally, it took several years to get the shape of a protein.
The latest technological advances have made it possible to generate electron density maps with close to atomic resolution using cryogenic electron microscopes.
These methods rely on a large number of experiments and improvement errors, it may take years of work to complete each protein structure, and requires the use of millions of dollars of specialized equipment for testing and verification.
AlphaFold's method to solve protein folding problems
DeepMind used the initial version of AlphaFold to participate in CASP13 for the first time in 2018, and achieved the highest accuracy among the participants, and then published a paper on the CASP13 method and related codes in the journal Nature. This paper continues to inspire Open source implementation developed by other work and community.
Now, the new deep learning architecture developed by DeepMind has promoted changes in the CASP14 method, enabling it to achieve unprecedented accuracy. These methods are inspired by the fields of biology, physics and machine learning, and of course the work of many scientists in the field of protein folding over the past half century.
A folded protein can be thought of as a "spatial graph" in which residues are connected by nodes and edges.
In addition, AlphaFold can also use internal confidence to predict which parts of each predicted protein structure are reliable.
The data used by this system includes approximately 170,000 protein structures from protein databases, as well as large databases containing protein sequences of unknown structures. It uses about 128 TPU v3 (roughly equivalent to 100-200 GPUs) and only takes a few weeks of training, which is a relatively small amount of calculation in most SOTA models used in the field of machine learning today.
Historic breakthrough! AlphaFold will "change everything"
The important significance of predicting changes in protein structure is that almost all diseases, including cancer and dementia, are related to changes in protein structure in cells. If the changes in protein structure can be grasped, it will have an important impact on disease prevention and treatment.
Normally, it takes scientists years to identify the structure of a single protein. Today, AlphaFold can provide results accurate to one atom within a few days.
This move will greatly accelerate humans’ understanding of cellular components and help the research of all diseases, including new coronary pneumonia.
At the same time, based on AlphaFold's new breakthroughs, humans may also discover more advanced new drugs faster in the future.
Coronavirus structure diagram
References:
[1]https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology
[2]https://www.nature.com/articles/d41586-020-03348-4
[4]https://mp.weixin.qq.com/s/JTQWk49nPW7TX3DEdy5RqA