Structure - Methods summary

Summary

The Structure section contains information about the three-dimensional structure of human proteins.The predicted 3D structure from the AlphaFold Protein Structure Database developed by Deepmind and EMBL-EBI is shown together with experimentally determined structures from the Protein Data Bank (PDB). Known antigen sequences as well as the amino acid positions of population variants and variants with known clinical relevance in the Ensembl variation database can also be displayed.

Key publications

Jumper J et al. (2021) "Highly accurate protein structure prediction with AlphaFold" Nature 596(7873):583-589.

Varadi M et al. (2022) "AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models" Nucleic Acids Research 50(D1):D439-D444

Berman HM et al. (2000) "The Protein Data Bank" Nucleic Acids Research, 28: 235-242

What can you learn from the Structure Section?

Learn about:

  • the experimental and predicted 3D structure of proteins
  • the known missense variants with clinical significance
  • the known missense variants in the population
  • the antigen structure for the majority of the antibodies

How has the data been generated?

The predicted 3D protein structures are retrieved from the AlphaFold Protein Structure Database developed by DeepMind and EMBL-EBI. The AI-system Alphafold is a machine learning approach in which the primary amino acid sequence and aligned sequences of homologues together with physical and biological knowledge about related protein structures are incorporated into the design of a deep learning algorithm to directly predict the 3D structure of a protein.

The experimental 3D protein structures are retrieved from the Protein Data Bank (PDB) and mainly includes structures determined by X-ray crystallography, NMR spectroscopy, and 3D electron microscopy. The structures included in our data are those that cover at least 80% of the length of an ensembl isoform with 100% identity.

The population and clinical variants data is incorporated from the Ensembl variation database. For variants with clinical relevance only variants with clinical significance terms "pathogenic" and "likely pathogenic" were included.

All structures are displayed using the NGL Viewer.

What is presented in the section?

In the gene summary page of the Structure section, predicted and available experimental 3D protein structures can be displayed and explored. In the drop-down panel the available experimental structures for each protein can be selected and displayed. Check boxes allow for display of antigen sequences and positions for population and/or clinical variants, and the structures can be colored according to b-factor, residue index or chain name.