Skip to contents

Function for performing the whole SpectroPipeR analysis workflow.

Usage

SpectroPipeR(
  file = "",
  parameter = list(),
  max_chars_file_name_capping = 25,
  ID_condition_filtering = FALSE,
  ID_condition_filtering_percent = 0.5,
  batch_adjusting = FALSE,
  sample__batch_meta_data_file = NULL,
  batch_adjusting_column = "",
  number_of_cores_adjusting = parallel::detectCores() - 2,
  covariate_adjusting_formula = "",
  covariate_adjusting_meta_data_file = "",
  skipping_MaxLFQ_median_norm = FALSE,
  HCPC_analysis = FALSE,
  costum_colors = NULL,
  condition_comparisons = NULL,
  number_of_cores_statistics = 2,
  build_HTML_report = TRUE,
  report_copy = FALSE
)

Arguments

file

location (path) of Spectronaut output report; you should use the Spectronaut_export_scheme() function for getting a SpectroPipeR report scheme encompassing all mandatory columns

parameter

mandatory parameter list element

table of list elements:

parameter description
output_foldermandatory !!! - character - output folder path (abs.)
ion_q_value_cutoffdefault = 0.01 - numeric - Q-value used in Spectronaut analysis: Biognosys
default is 0.01 = 1% error rate
id_drop_cutoffdefault = 0.3 - numeric - value between 0-1 (1 = 100%); xx percent lower
than median of ion ID rate => outlier
normalization_methoddefault = "median" - character - "median" or Spectronaut - auto-detection
is per default ON, meaning if normalization was performed in Spectronaut
this will be detected and preferred over parameter setting here;
median normalization is the fallback option
normalization_factor_cutoff_outlierdefault = 4 - numeric - median off from global median
(4 means abs. 4fold off)
filter_oxidized_peptidesdefault = TRUE logical - if oxidized peptides should be removed before
peptide quantification
protein_intensity_estimationdefault = "MaxLFQ" - character - Hi3 = Hi3 protein intensity estimation,
MaxLFQ = MaxLFQ protein intensity estimation
stat_testdefault = "rots" - character - choose statistical test: "rots" = reproducibility
optimized test statistics, "modt" = moderate t-test (lmfit, eBayes),
"t" = t-test
type_slrdefault = "median" - character - choose ratio aggregation method:
"median" or "tukey" is used when calculating protein values
fold_changedefault = 1.5 - numeric - fold-change used as cutoff e.g. 1.5
p_value_cutoffdefault = 0.05 - numeric - p-value used as cutoff e.g. 0.05
paireddefault = FALSE - logical - Should paired statistics be applied?
example parameters list (default):
params <- list(output_folder = "../Spectronaut_example",
ion_q_value_cutoff = 0.01,
id_drop_cutoff = 0.3,
normalization_method = "median",
normalization_factor_cutoff_outlier = 4,
filter_oxidized_peptides = T,
protein_intensity_estimation = "MaxLFQ",
stat_test = "rots",
type_slr = "median",
fold_change = 1.5,
p_value_cutoff = 0.05,
paired = FALSE
)
max_chars_file_name_capping

integer, (default = 25) number of max characters used for raw file name presentation; must be adjusted if function

ID_condition_filtering

TRUE or FALSE if a condition-wise filtering should be performed

ID_condition_filtering_percent

(numerical value ranging from 0 - 1, default = 0.5) define the proportion for the condition-wise ID filtering

batch_adjusting

logical - if batch adjusting with ComBat (sva package) should be performed; default = FALSE

sample__batch_meta_data_file

character - sample batch file; tab-delimited txt-file, containing "R.FileName" column e.g. sample__batch_meta_data_file = "Sample_MetaData_Batches.txt"

example table for batch meta data:

R.FileName digest_batch
20230403_TIMSTOF_1_S1-B11_1_66901
20230403_TIMSTOF_2_S1-G11_1_66951
20230403_TIMSTOF_3_S1-E7_1_66611
20230403_TIMSTOF_4_S1-A9_1_66732
20230403_TIMSTOF_7_S1-D8_1_66682
20230403_TIMSTOF_9_S1-D3_1_66272

A good starting point for the generation of the table is the '*_ConditionSetup.tsv' in your Spectronaut Pipeline Report export folder

batch_adjusting_column

character - column name in sample__batch_meta_data_file, which should be used for assigning the samples to batches

number_of_cores_adjusting

numeric - number of processor cores used for batch or covariate adjustment

covariate_adjusting_formula

character - provide a formula passed to lm() for covariate adjustment e.g. "log10_peptide_intensity ~ log10(CRP)+log10(age)+as.factor(sex)"; you may also use ns() function e.g. "log10_peptide_intensity ~ ns(age, df=3)"

covariate_adjusting_meta_data_file

covariate meta csv file, containing "R.FileName; age; sex;..."; you may find a start file in the 02_ID_rate folder > file_list.csv column e.g. covariate_adjusting_meta_data_file = "covariate_MetaData_file.csv" example table for covariate meta data:

R.FileName R.Condition sex CRP
20230403_TIMSTOF_1_S1-B11_1_6690heathy13.8
20230403_TIMSTOF_2_S1-G11_1_6695heathy25.1
20230403_TIMSTOF_3_S1-E7_1_6661heathy11.2
20230403_TIMSTOF_4_S1-A9_1_6673cancer150.2
20230403_TIMSTOF_7_S1-D8_1_6668cancer230.8
20230403_TIMSTOF_9_S1-D3_1_6627cancer264.1
skipping_MaxLFQ_median_norm

logical - if median normalization after MaxLFQ calculation should be skipped; default = FALSE; applied only if MaxLFQ protein estimation is selected

HCPC_analysis

boolean; should a HCPC be performed or not

costum_colors

if you would like to use your own colors for condition coloring please provide a named color vector (e.g. c(condition1 = "black", condition2 = "grey")); names should have the same naming and length like the conditions set in Spectronaut

condition_comparisons

condition comparisons for pairwise- comparison; e.g. condition_comparisons <- cbind(c("condition1","control"),c("condition3","control") )

number_of_cores_statistics

number of processor cores to be used for the calculations default = 2;

parallel::detectCores()-2 for faster processing (will detect the number of cores in the system and use nearly all cores)

build_HTML_report

boolean; if a HTML report of the analysis should be generated or not

report_copy

if TRUE –> copy Spectronaut input report to SpectroPipeR project folder 01_input_data

Value

SpectroPipeR list object containing tables and plots of the analysis in addition to the automatically saved tables and plots. For the description of the generated figures and tables please read the manual & vignettes

The SpectroPipeR list element contains:

  • SpectroPipeR_data

  • SpectroPipeR_data_quant

  • SpectroPipeR_data_MVA

  • SpectroPipeR_data_stats

SpectroPipeR_data:

list element description
spectronaut_outputtibble: Spectronaut report tibble provided for the analysis
SDRF_filetibble: intermediate SDRF table of the analysis
summary_distincttibble: distinct ion, modified peptide, stripped peptides and
protein group count per file filtered by provided Q-value
raw_file_namestibble: R.FileNames capped and uncapped version together with
R.Condition and R.Replicate
ion_id_mediannumerical value: median of ion intensity
ion_id_cutoffnumerical value: ion ID count threshold to classify sample as outlier
PG_2_peptides_ID_rawtibble: with protein groups with at least 2 peptides with peptide
and replicate count
summary_distinct_outliertibble: if outlier are detected they are listed in this tibble
ID_rate_plotggplot2 plot: ID rate plot
ID_rate_plot_filterggplot2 plot: ion ID rate plot with ion ID cutoff line
sample_lengthnumberical value: number of samples in the provided Spectronaut report
parameterlist: parameters provided by the user
time_stamp_log_filestring: time stamp of the log file (format: %Y_%m_%d__%H_%M)
log_file_namestring: analysis log file name

SpectroPipeR_data_quant:

list element description
data_input_normalizedtibble: Spectronaut report tibble provided for the analysis
MedianNormalizationFactortibble: ion normalization factor table
MedianNormalizationFactor_outliertibble: table containing detected norm. outliers on ion level
NormFactor_plotggplot2 plot: norm. factor plots
iBAQ_intensitiestibble: table containing the iBAQ int.
iBAQ_intensities_summarytibble: table containing the per condition summarized iBAQ int.
protein_datatibble: protein intensity table (e.g. Hi3 or MaxLFQ int.)
PG_2_peptides_ID_rawtibble: with protein groups with at least 2 peptides with peptide
and replicate count
protein_data_normalization_factortibble: normalization factor table for protein int. data
peptide_intensity_filtered_2pep_hi3tibble: if Hi3 protein int. was selected a table containing
the peptides and intensities used for Hi3 protein intensity calculation
peptide_intensitytibble: peptides intensity table based on norm. ion intensity
parameterlist: parameters provided by the user updated with the
norm_quant_module() parameters
CV_cumulative_frequencytibble: cumulative frequency table on peptide
and protein intensity level
sample_lengthnumberical value: number of samples in the provided Spectronaut report

SpectroPipeR_data_MVA:

list element description
PCA_peptide_intensityPCA list element: PCA list element of peptide int.
PCA_protein_intensityPCA list element: PCA list element of protein int.
UMAP_protein_intensityumap element: UMAP element of protein int.
peptide_intensity_correlationmatrix: Spearman correlation scores of peptide int.
protein_intensity_correlationmatrix: Spearman correlation scores of protein int.

SpectroPipeR_data_stats:

list element description
stat_resultstibble: statistical analysis results table
stat_column_descriptiontibble: statistical analysis results table column description
stats_results_iBAQ_quantilestibble: statistical analysis results table containing
the iBAQ quantilies (Q1-Q10) of the protein per group for a better
ratio judgement
stat_results_filteredtibble: filtered (user defined FC and adj. p-value) statistical
analysis results table

Details

batch adjustment

Batch effects refer to systematic differences between batches (groups) of samples in high-throughput experiments. These differences can arise due to various factors, such as batch variations in sample preparation, handling, processing procedures and measurement orders. Batch effects can obscure the true biological signal and lead to incorrect conclusions if not properly accounted for. In the SpectroPipeR pipeline, the ComBat tool was employed to adjust for batch effects in the datasets where the batch covariate was known. ComBat utilizes the methodology described in Johnson et al. 2007. It uses an empirical Bayes (EB) framework for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. Johnson et al. 2007: This method incorporates systematic batch biases common across genes in making adjustments, assuming that phenomena resulting in batch effects often affect many genes in similar ways (i.e. increased expression, higher variability, etc). Specifically, the the L/S model parameters are estimated that represent the batch effects by pooling information across peptides in each batch to shrink the batch effect parameter estimates toward the overall mean of the batch effect estimates (across genes). These EB estimates are then used to adjust the data for batch effects, providing more robust adjustments for the batch effect on each peptide. In SpectroPipeR a parametric ComBAT emperical Bayes adjustment is implemented by utilizing the sva-package.

covariate adjustment

If a covariate adjustment of peptide intensity data was performed using the users input formula, a linear mixed model (LMM) was calculated based on that formula per peptide and the outcoming residuals were added to the mean peptide intensity over the samples. This means that the adjusted peptide intensities retain their intensity level (low intense peptides keep their low intensity and high intense ions keep their higher intensity).

Examples

# \donttest{
# load library
library(SpectroPipeR)

# use default parameters list
params <- list(output_folder = "../SpectroPipeR_test_folder")

# example input file
example_file_path <- system.file("extdata",
                                "SN_test_HYE_mix_file.tsv",
                                package="SpectroPipeR")
# perform the analysis
SpectroPipeR_analysis <- SpectroPipeR(file = example_file_path,
                                     parameter = params,
                                     condition_comparisons = cbind(c("HYE mix A","HYE mix B"))
                                     )
# }