SpectroPipeR: normalization & quantification module — norm_quant

Function for normalizing and quantifying Spectronaut output reports, serving as the second step in the pipeline and building upon the initial step of reading Spectronaut data.

Usage

norm_quant_module(
  SpectroPipeR_data = NULL,
  batch_adjusting = FALSE,
  sample__batch_meta_data_file = NULL,
  batch_adjusting_column = "",
  number_of_cores_adjusting = parallel::detectCores() - 2,
  covariate_adjusting_formula = "",
  covariate_adjusting_meta_data_file = "",
  skipping_MaxLFQ_median_norm = FALSE,
  costum_colors = NULL,
  print.plot = FALSE
)

Arguments

SpectroPipeR_data

it is the SpectroPipeR_data list object generated by the read_spectronaut_module() function

batch_adjusting

logical - if batch adjusting with ComBat (sva package) should be performed; default = FALSE

sample__batch_meta_data_file

character - sample batch file; tab-delimited txt-file, containing "R.FileName" column e.g. sample__batch_meta_data_file = "Sample_MetaData_Batches.txt"

example table for batch meta data:

R.FileName	digest_batch
20230403_TIMSTOF_1_S1-B11_1_6690	1
20230403_TIMSTOF_2_S1-G11_1_6695	1
20230403_TIMSTOF_3_S1-E7_1_6661	1
20230403_TIMSTOF_4_S1-A9_1_6673	2
20230403_TIMSTOF_7_S1-D8_1_6668	2
20230403_TIMSTOF_9_S1-D3_1_6627	2

A good starting point for the generation of the table is the '*_ConditionSetup.tsv' in your Spectronaut Pipeline Report export folder

batch_adjusting_column

character - column name in sample__batch_meta_data_file, which should be used for assigning the samples to batches

number_of_cores_adjusting

numeric - number of processor cores used for batch or covariate adjustment

covariate_adjusting_formula

character - provide a formula passed to lm() for covariate adjustment e.g. "log10_peptide_intensity ~ log10(CRP)+log10(age)+as.factor(sex)"; you may also use ns() function e.g. "log10_peptide_intensity ~ ns(age, df=3)"

covariate_adjusting_meta_data_file

covariate meta csv file, containing "R.FileName; age; sex;..."; you may find a start file in the 02_ID_rate folder > file_list.csv column e.g. covariate_adjusting_meta_data_file = "covariate_MetaData_file.csv"

example table for covariate meta data:

R.FileName	R.FileName_raw	R.Condition	sex	CRP
20230403_TIMS..._S1-B11_1_6690	20230403_TIMSTOF_1_S1-B11_1_6690	heathy	1	3.8
20230403_TIMS..._S1-G11_1_6695	20230403_TIMSTOF_2_S1-G11_1_6695	heathy	2	5.1
20230403_TIMS...3_S1-E7_1_6661	20230403_TIMSTOF_3_S1-E7_1_6661	heathy	1	1.2
20230403_TIMS...4_S1-A9_1_6673	20230403_TIMSTOF_4_S1-A9_1_6673	cancer	1	50.2
20230403_TIMS...7_S1-D8_1_6668	20230403_TIMSTOF_7_S1-D8_1_6668	cancer	2	30.8
20230403_TIMS...9_S1-D3_1_6627	20230403_TIMSTOF_9_S1-D3_1_6627	cancer	2	64.1

skipping_MaxLFQ_median_norm

logical - if median normalization after MaxLFQ calculation should be skipped; default = FALSE; applied only if MaxLFQ protein estimation is selected

costum_colors

if you would like to use your own colors please provide a named color vector (e.g. c(condition1 = "black", condition2 = "grey")); names should have the same naming and length like the conditions set in Spectronaut

print.plot

logical - should a plot of normalization factors be printed

Value

SpectroPipeR_norm_quant list object with the loaded raw data and processed data tables, in addition to the automatically saved tables and plots For the description of the generated figures and tables please read the manual & vignettes

list element	description
data_input_normalized	tibble: Spectronaut report tibble provided for the analysis
MedianNormalizationFactor	tibble: ion normalization factor table
MedianNormalizationFactor_outlier	tibble: table containing detected norm. outliers on ion level
NormFactor_plot	ggplot2 plot: norm. factor plots
iBAQ_intensities	tibble: table containing the iBAQ int.
iBAQ_intensities_summary	tibble: table containing the per condition summarized iBAQ int.
protein_data	tibble: protein intensity table (e.g. Hi3 or MaxLFQ int.)
PG_2_peptides_ID_raw	tibble: with protein groups with at least 2 peptides with peptide
	and replicate count
protein_data_normalization_factor	tibble: normalization factor table for protein int. data
peptide_intensity_filtered_2pep_hi3	tibble: if Hi3 protein int. was selected a table containing
	the peptides and intensities used for Hi3 protein intensity calculation
peptide_intensity	tibble: peptides intensity table based on norm. ion intensity
parameter	list: parameters provided by the user updated with the
	norm_quant_module() parameters
CV_cumulative_frequency	tibble: cumulative frequency table on peptide
	and protein intensity level
sample_length	numberical value: number of samples in the provided Spectronaut report

Details

batch adjustment

Batch effects refer to systematic differences between batches (groups) of samples in high-throughput experiments. These differences can arise due to various factors, such as batch variations in sample preparation, handling, processing procedures and measurement orders. Batch effects can obscure the true biological signal and lead to incorrect conclusions if not properly accounted for. In the SpectroPipeR pipeline, the ComBat tool was employed to adjust for batch effects in the datasets where the batch covariate was known. ComBat utilizes the methodology described in Johnson et al. 2007. It uses an empirical Bayes (EB) framework for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. Johnson et al. 2007: This method incorporates systematic batch biases common across genes in making adjustments, assuming that phenomena resulting in batch effects often affect many genes in similar ways (i.e. increased expression, higher variability, etc). Specifically, the the L/S model parameters are estimated that represent the batch effects by pooling information across peptides in each batch to shrink the batch effect parameter estimates toward the overall mean of the batch effect estimates (across genes). These EB estimates are then used to adjust the data for batch effects, providing more robust adjustments for the batch effect on each peptide. In SpectroPipeR a parametric ComBAT emperical Bayes adjustment is implemented by utilizing the sva-package.

covariate adjustment

If a covariate adjustment of peptide intensity data was performed using the users input formula, a linear mixed model (LMM) was calculated based on that formula per peptide and the outcoming residuals were added to the mean peptide intensity over the samples. This means that the adjusted peptide intensities retain their intensity level (low intense peptides keep their low intensity and high intense ions keep their higher intensity).

Examples

# \donttest{
#load library
library(SpectroPipeR)

# use default parameters list
params <- list(output_folder = "../SpectroPipeR_test_folder")

# example input file
example_file_path <- system.file("extdata",
                                "SN_test_HYE_mix_file.tsv",
                                package="SpectroPipeR")

# step 1: load Spectronaut data module
SpectroPipeR_data <- read_spectronaut_module(file = example_file_path,
                                      parameter = params,
                                      print.plot = FALSE)

# step 2: normalize & quantification module
SpectroPipeR_data_quant <- norm_quant_module(SpectroPipeR_data = SpectroPipeR_data)
# }