Human Disease Blood Atlas - Method Summary Summary What can you learn from the Disease Blood Atlas?How has the Proximity Extension Assay data been generated?How has the Targeted Proteomics data been generated?What is presented in the section?

Human Disease Blood Atlas - Method Summary

Summary

A comprehensive characterization of the blood proteome profiles in patients with various diseases can contribute to a better understanding of the disease etiology, resulting in earlier diagnosis, risk stratification and better monitoring of the disease progression. Precision Medicine thus aims to allow for an individualized diagnosis, treatment and monitoring of patients, including the use of molecular tools such as genomics, proteomics and metabolomics. In the first version of the Disease Blood Atlas, a pan-cancer study covering 12 major cancer types is reported.

What can you learn from the Disease Blood Atlas?

Learn about

comprehensive and precise protein levels in blood covering all major diseases
proteins associated with each of the analyzed cancers

How has the Proximity Extension Assay data been generated?

Next Generation Blood Profiling has been used combining antibody-based proximity extension assay with next generation sequencing (Wik L et al. (2021)) to allow the exploration of the protein concentrations in blood from patients with different cancers.Plasma profiles of 1463 proteins from more than 1400 cancer patients representing altogether 12 common cancer types (Figure 1) were measured in minute amounts of blood plasma collected at the time of diagnosis and before treatment. To investigate the cancer-specific proteome profiles, differential expression analyses were performed by comparing each cancer to all other cancers in the study. For the male and female cancers, only samples with the same sex were compared. The up- and down-regulated proteins in each cancer are summarized in the volcano plots displayed in the sections for the different cancers, and highlighting the most significantly differentially expressed proteins. The results for all cancer patients for each protein target are presented on the individual gene pages.

Figure 1. Age distribution and number of patients for each cancer type included in the study.

AI-based disease prediction models was used to identify sets of proteins associated with each of the analyzed cancers. The aim of the protein panel is to distinguish plasma protein profiles from different cancers and by combining the results from all cancer types, a panel of proteins (see Table 1 below) was selected suitable for the identification of the 12 different cancer types.To identify proteins relevant for each cancer type, a disease prediction model was built for each cancer type respectively, using all measured proteins (n= 1463) and 70% of the cancer patients as the training set. The control group in each model was composed of all the other cancer samples and was subsampled to include a similar number of patients to the modelled cancer. Here, we show the results obtained from the algorithm regularized generalized linear model (glmnet), which gives an estimation of the overall importance of each protein to the model (range 0-100%). The lollipop plots found in the sections for the different cancers types show the top 10 most important proteins resulting from the model for the classification of that specific cancer type.

Figure 2. Overview of the workflow used to identify a pan-cancer biomarker panel for cancer classification.

Table 1. The 83 proteins used for identification of 12 different cancer types .

Cancer	Protein	Importance	p.adjusted	NPX fold change
Acute myeloid leukemia	CD244	100.0	8.8e-14	1.5
Acute myeloid leukemia	FLT3	98.7	9.8e-23	3.3
Acute myeloid leukemia	TNFSF13B	60.4	4.6e-10	1.8
Breast cancer	PRTG	64.9	1.1e-10	0.2
Breast cancer	LAMP3	58.1	8.5e-3	0.2
Breast cancer	SDC4	56.1	1.5e-10	0.6
Breast cancer	OXT	56.1	9.6e-4	0.6
Breast cancer	HSD11B1	53.2	4.6e-3	0.1
Breast cancer	BTC	52.3	4.0e-4	0.4
Breast cancer	LPL	51.7	6.6e-5	0.2
Breast cancer	MSMB	51.6	2.9e-2	0.2
Cervical cancer	GLO1	78.1	7.5e-5	0.4
Cervical cancer	CHRDL2	75.6	7.7e-6	0.4
Cervical cancer	FCGR3B	69.1	4.8e-4	0.2
Cervical cancer	CRNN	67.9	3.3e-5	0.5
Cervical cancer	AGER	60.5	3.1e-4	0.2
Cervical cancer	MFAP5	58.0	3.3e-2	0.2
Cervical cancer	LYPD3	56.8	7.0e-4	0.3
Chronic lymphocytic leukemia	TCL1A	100.0	2.3e-37	5.4
Chronic lymphocytic leukemia	STC1	69.4	2.2e-29	2.0
Chronic lymphocytic leukemia	FCRL2	69.0	1.6e-20	3.5
Chronic lymphocytic leukemia	CD22	69.0	1.2e-17	2.4
Chronic lymphocytic leukemia	FCER2	66.7	1.1e-19	4.8
Chronic lymphocytic leukemia	CD6	65.0	8.1e-16	2.5
Colorectal cancer	PRDX5	81.3	5.9e-38	0.9
Colorectal cancer	TFRC	79.4	4.4e-10	0.3
Colorectal cancer	PADI4	76.3	1.9e-4	0.3
Colorectal cancer	FKBP1B	71.5	1.9e-33	1.5
Colorectal cancer	PRDX6	70.8	1.2e-31	0.5
Colorectal cancer	LGALS4	63.6	2.5e-11	0.4
Colorectal cancer	CCL20	58.6	2.4e-5	0.4
Colorectal cancer	SELE	53.5	1.9e-2	0.1
Colorectal cancer	LAP3	52.0	3.8e-5	0.2
Colorectal cancer	AREG	50.1	7.3e-7	0.4
Diffuse large B-cell lymphoma	CXCL9	68.9	6.3e-7	2.2
Diffuse large B-cell lymphoma	CXCL13	66.6	1.9e-8	2.5
Diffuse large B-cell lymphoma	DCXR	59.3	1.2e-2	0.4
Diffuse large B-cell lymphoma	SERPINA9	52.1	5.7e-5	2.0
Endometrial cancer	PLAT	90.4	1.7e-7	0.5
Endometrial cancer	TNFSF10	74.8	1.7e-7	0.2
Endometrial cancer	DPT	69.2	2.5e-13	0.3
Endometrial cancer	CLEC7A	55.8	1.7e-2	0.2
Endometrial cancer	CLMP	53.7	1.2e-5	0.2
Endometrial cancer	AFP	51.9	2.3e-3	0.3
Glioma	GFAP	100.0	1.6e-27	3.7
Glioma	BCAN	45.8	1.8e-15	0.5
Glioma	ADAMTS13	30.4	1.4e-5	0.1
Lung cancer	CEACAM5	89.6	6.0e-22	1.4
Lung cancer	MMP12	79.7	5.4e-26	1.1
Lung cancer	BCL2L11	78.1	4.2e-4	0.1
Lung cancer	PRDX5	76.8	1.1e-26	0.7
Lung cancer	LSM1	67.8	1.6e-3	0.2
Lung cancer	BPIFB1	67.0	3.9e-5	0.3
Lung cancer	ABHD14B	66.9	7.2e-20	0.6
Lung cancer	FKBP1B	66.2	2.7e-26	1.2
Lung cancer	CXCL17	66.1	2.8e-36	0.8
Lung cancer	MLN	64.9	2.9e-12	0.6
Lung cancer	ANXA11	63.2	8.4e-12	0.4
Lung cancer	SFTPD	63.2	3.2e-10	0.4
Lung cancer	MTPN	62.6	4.4e-4	0.1
Lung cancer	SCGB3A2	59.6	1.4e-3	0.3
Lung cancer	LBP	56.9	7.5e-4	0.3
Lung cancer	ACP5	53.2	2.8e-2	0.1
Lung cancer	TFPI2	50.5	6.4e-6	0.3
Lung cancer	COL9A1	50.2	2.9e-3	0.2
Myeloma	CNTN5	100.0	1.7e-9	2.9
Myeloma	SLAMF7	53.9	5.3e-16	3.7
Myeloma	MZB1	35.3	4.8e-13	2.8
Ovarian cancer	PAEP	100.0	1.9e-34	3.2
Ovarian cancer	CDH3	26.7	2.1e-14	0.7
Ovarian cancer	SSC5D	23.4	1.9e-9	0.4
Prostate cancer	DNER	94.0	1.8e-17	0.4
Prostate cancer	IL20	87.5	4.7e-2	0.1
Prostate cancer	FAP	85.1	7.8e-40	0.5
Prostate cancer	CXCL6	84.4	8.7e-5	0.4
Prostate cancer	CD34	83.2	2.4e-15	0.3
Prostate cancer	CDH17	82.3	1.3e-7	0.5
Prostate cancer	CRTAC1	73.1	1.6e-29	0.5
Prostate cancer	IL18RAP	69.0	5.0e-2	0.4
Prostate cancer	TRAF2	67.0	6.7e-12	0.6
Prostate cancer	ADAMTS8	61.2	1.6e-26	0.7
Prostate cancer	GZMB	59.6	3.6e-7	0.5
Prostate cancer	SPINK5	57.8	8.4e-11	0.3
Prostate cancer	F3	57.2	2.4e-4	0.1
Prostate cancer	ICAM4	52.4	8.0e-4	0.2
Show allShow less

How has the Targeted Proteomics data been generated?

Targeted proteomics is a bottom-up proteomics approach that uses proteases, most commonly trypsin, to digest proteins into peptides that can be measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS). This quantitative strategy is an excellent tool for performing measurements with high reproducibility and precision, making it appropriate for quantifying proteins in cells, tissues and blood.

Targeted proteomics, as opposed to the widely used data-dependent acquisition (DDA), also known as shotgun proteomics, works with a defined collection of peptides and builds on prior knowledge about the analytes. Generally, a peptide quantification can be either relative or absolute. Relative quantification is a method for describing the amount of an analyte in proportion to another measurement of the same analyte across several biological samples or across two groups, as in case-control studies. Absolute concentrations can be measured by the addition of heavy-labelled standards in known amounts during the sample preparation workflow. Using heavy labelled standards can also considerably increase consistency and precision and it can be done at a large scale by adding either isotope-labeled peptides or protein standards.

A quantitative strategy based on heavy isotope-labeled PrESTs was originally developed as a collaborative effort between Professor Matthias Mann and Professor Mathias Uhlén (Zeiler M et al. (2012)). They introduced the multiplex PrEST-SILAC quantitative approach. This quantitative workflow was based on shotgun proteomics and had the benefit of being relatively simple to execute and straightforward to work with. The addition of stable isotope labeled (SIS) PrESTs, combined with a mass spectrometry readout, can be used in almost any MS setup and analysis modes, including both targeted (SRM, MRM, PRM, DIA) and untargeted (DDA) modes of operation. The standards are added to the sample at the initial stage in the proteomics workflow and therefore, they can account for potential digestion biases as they generate the the same prototypic peptides (Figure 3) and mimic the exact amino acid repertoire of the endogenous protein. This is otherwise a common source of errors that is affecting almost every LC-MS/MS sample preparation workflow and can be very hard to control for unless the protein standard is cleaved together with the endogenous sample.

Figure 3. The standard's N-terminal sequence enables affinity purification and measurement. The C-terminal portion contains 50–150 human amino acids. Each standard contains numerous tryptic peptides that can be used to measure an unknown sample's target protein.

Each SIS-PrEST standard is fully labeled with 13C and 15N enriched arginine and lysine, and the protein sequence used for quantification span shorter amino acid sequences (50-150 aa) representative of the target protein of interest (Figure 4).

In the Disease Atlas, 273 SIS-PrESTs were spiked in known concentrations directly into undepleted human blood plasma from 1,469 cancer patients. The spiked amount were tuned to be as close to a 1:1 ratio with the endogenous proteins as possible. This increases the analytical precision during a one-point calibration-based quantification of the endogenous proteins. The quantitative peptides were selected using the lowest coefficient of variation and highest frequency of detection as selection criteria, while the single best-performing peptide per protein was used.

Figure. 4. Targeted Proteomics workflow using SIS-PrESTs. Production of Standards: PrESTs from the human protein atlas are labeled in high-throughput with heavy Arginine (Arg10) and Lysine (Lys8) amino acid residues. Each PrEST fragment can be individually quantified by the common Q-Tag sequence (also used for purification). Assay Generation: Heavy peptides originating from the PrEST sequence are used to establish targeted assays. The quantitative range is defined, and the protein level in healthy plasma is determined in a pool of healthy volunteers. Targeted Proteomics: SIS-PrESTs are spiked directly into non-depleted human plasma collected from cancer patients and act as internal standards throughout the workflow. Quantitative Mass Spectrometry: Endogenous peptides from each patient is measured together with the spiked internal standard. The known amount of spiked standard is used to calculate the absolute concentration of each protein analyte.

What is presented in the section?

The protein levels for all cancer patients for each protein target, together with information on whether the target is upregulated in any of the diseases and/or is included in any disease prediction model, are presented on the individual gene summary pages in the Human Protein Atlas.