| Title: | A GUI for Dual and Bulk RNA-Sequencing Analysis |
|---|---|
| Description: | A 'shiny' app that supports both dual and bulk RNA-seq, with the dual RNA-seq functionality offering the flexibility to perform either a sequential approach (where reads are mapped separately to each genome) or a combined approach (where reads are aligned to a single merged genome). The user-friendly interface automates the analysis process, providing step-by-step guidance, making it easy for users to navigate between different analysis steps, and download intermediate results and publication-ready plots. |
| Authors: | Carmine Fruggiero [aut, cre], Gaetano Aufiero [aut] |
| Maintainer: | Carmine Fruggiero <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.3 |
| Built: | 2026-06-09 07:12:17 UTC |
| Source: | https://github.com/indagoverse/indago |
Create a barplot of library sizes per sample, optionally using effective library sizes.
barplotExp(x, palette, main, selectOrder, effecLibSize)barplotExp(x, palette, main, selectOrder, effecLibSize)
x |
A DGEList object from "edgeR". |
palette |
Character. Name of a discrete color palette from the "paletteer" package. |
main |
Character. Title for the barplot. |
selectOrder |
Character. Either "Groups" (order samples by group) or "Samples" (order by sample name). |
effecLibSize |
Logical. If TRUE, use effective library size (norm factors × raw size); otherwise use raw size. |
This function extracts library size information from an "edgeR" "DGEList", computes effective library sizes if requested, orders samples by group or name, and plots library sizes (in millions) colored by group.
Extracts or computes (effecLibSize = TRUE) the library size for each sample.
Orders samples by group or sample name per selectOrder.
Plots bar heights as library size (×10^6) with white fill and colored borders.
A "ggplot" object showing per-sample barplots of library size in millions.
BaseAverageQualityPlot
BaseAverageQualityPlot(input_data)BaseAverageQualityPlot(input_data)
input_data |
folder containing data |
interactive BaseAverageQualityPlot
BaseAverageQualityPlotly(input_data)BaseAverageQualityPlotly(input_data)
input_data |
folder containing data |
BaseCompositionAreaChartPlot
BaseCompositionAreaChartPlot(input_data)BaseCompositionAreaChartPlot(input_data)
input_data |
folder containing data |
BaseCompositionLinePlot
BaseCompositionLinePlot(input_data)BaseCompositionLinePlot(input_data)
input_data |
folder containing data |
BaseQualityBoxplotPlot
BaseQualityBoxplotPlot(input_data)BaseQualityBoxplotPlot(input_data)
input_data |
folder containing data |
Generate a boxplot of log-CPM expression values per sample, colored by group.
boxplotExp(x, y, palette, main, selectOrder)boxplotExp(x, y, palette, main, selectOrder)
x |
A DGEList object from "edgeR". |
y |
Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm(). |
palette |
Character. Name of a discrete palette from the paletteer package. |
main |
Character. Title for the boxplot. |
selectOrder |
Character. Either "Groups" (order samples by group) or "Samples" (order by sample name). |
This function orders samples by group or sample name, and produces a ggplot2 boxplot with a horizontal line at the overall median.
Extract sample metadata (Samples, Groups) from "x$samples".
Order columns of y by group or sample name per "selectOrder".
Melt the ordered matrix to long format and join with metadata.
Plot boxplots with no outliers, colored by group, and include a dashed line at the overall median.
A ggplot object showing per-sample boxplots of log-CPM values.
Bulk alignment function
BulkAlignment( lalista, nodes, readsPath, GenomeIndex, outBam, threads, outFormat, phredScore, maxExtractedSubreads, consensusVote, mismatchMax, maxMultiMapped, indelLength, fragmentMinLength, fragmentMaxLength, matesOrientation, readOrderConserved, coordinatesSorting, allJunctions, tempfolder )BulkAlignment( lalista, nodes, readsPath, GenomeIndex, outBam, threads, outFormat, phredScore, maxExtractedSubreads, consensusVote, mismatchMax, maxMultiMapped, indelLength, fragmentMinLength, fragmentMaxLength, matesOrientation, readOrderConserved, coordinatesSorting, allJunctions, tempfolder )
lalista |
list of samples |
nodes |
logic cores |
readsPath |
sample folders |
GenomeIndex |
genome index |
outBam |
output folder |
threads |
processes |
outFormat |
BAM or SAM |
phredScore |
quality score |
maxExtractedSubreads |
number of subreads |
consensusVote |
consensus |
mismatchMax |
mismatch |
maxMultiMapped |
multimapping |
indelLength |
indel |
fragmentMinLength |
fragment minumum length |
fragmentMaxLength |
fragment maximum length |
matesOrientation |
mate orientation |
readOrderConserved |
read order |
coordinatesSorting |
sorting |
allJunctions |
junctions |
tempfolder |
temporary folder |
Validate and extract non-empty annotation fields from a GTF file.
checkMetadata(gtfPath, typeFilter)checkMetadata(gtfPath, typeFilter)
gtfPath |
Character. Path to the directory or file location of the GTF file. |
typeFilter |
Character. The feature type to filter on (e.g., "gene", "exon"). |
This function imports a GTF file, filters entries by a specified feature type, and identifies metadata columns that contain at least one non-missing value.
Imports the GTF into a data frame via "rtracklayer::import()".
Filters rows by "type" == typeFilter.
Tests each column for all-NA or empty-string entries.
Returns names of columns with at least one non-missing, non-empty value.
Character vector of column names in the GTF annotation that are not entirely NA or empty.
Combined alignment for dual RNA-seq
CombinedAlignment( lalista, nodes, readsPath, GenomeConcIndex, outBam, threads, phredScore, maxExtractedSubreads, consensusVote, mismatchMax, maxMultiMapped, indelLength, fragmentMinLength, fragmentMaxLength, matesOrientation, readOrderConserved, coordinatesSorting, allJunctions, tempfolder, readsAlignedBlock )CombinedAlignment( lalista, nodes, readsPath, GenomeConcIndex, outBam, threads, phredScore, maxExtractedSubreads, consensusVote, mismatchMax, maxMultiMapped, indelLength, fragmentMinLength, fragmentMaxLength, matesOrientation, readOrderConserved, coordinatesSorting, allJunctions, tempfolder, readsAlignedBlock )
lalista |
list of samples |
nodes |
logic cores |
readsPath |
sample folders |
GenomeConcIndex |
genome index |
outBam |
output folder |
threads |
processes |
phredScore |
quality score |
maxExtractedSubreads |
number of subreads |
consensusVote |
consensus |
mismatchMax |
mismatch |
maxMultiMapped |
multimapping |
indelLength |
indel |
fragmentMinLength |
fragment minumum length |
fragmentMaxLength |
fragment maximum length |
matesOrientation |
mate orientation |
readOrderConserved |
read order |
coordinatesSorting |
sorting |
allJunctions |
junctions |
tempfolder |
temporary folder |
readsAlignedBlock |
chunks |
Plot a correlation heatmap of top variable genes across samples.
CorrPlotHeatmap( x, scale, Color, type, display, round_number, cutree_rows, cutree_cols, cluster, show_names, NumGenes, main )CorrPlotHeatmap( x, scale, Color, type, display, round_number, cutree_rows, cutree_cols, cluster, show_names, NumGenes, main )
x |
Numeric matrix of log-CPM values (genes × samples), e.g., from "edgeR::cpm()". |
scale |
Character. Scaling mode for the heatmap: "row", "column", or "none". |
Color |
Character. Name of a continuous palette from the "paletteer" package. |
type |
Character. Correlation method passed to "Hmisc::rcorr()": "pearson", "spearman", or "kendall". |
display |
Character. Which matrix to display: "correlation" (coefficients) or "pvalue". |
round_number |
Integer. Number of decimal places to round displayed numbers. |
cutree_rows |
Integer. Number of clusters to cut for row dendrogram. |
cutree_cols |
Integer. Number of clusters to cut for column dendrogram. |
cluster |
Character. Clustering mode: one of "both", "row", "column", or "none". |
show_names |
Character. One of "both", "row", "column", or "none" to display row/column labels. |
NumGenes |
Integer. Number of top-variance genes to include in the correlation. |
main |
Character. The plot title |
This function selects the highest-variance genes from a log-CPM matrix, computes pairwise correlation coefficients (or p-values) with "Hmisc::rcorr()", and renders a heatmap via "pheatmap", with options for clustering, scaling, and number display.
Compute per-gene variance and select the top "NumGenes".
Subset the matrix and compute correlations (and p-values) via "Hmisc::rcorr()".
Choose to display correlation coefficients or p-values, rounded to "round_number".
Determine clustering and label visibility from cluster and "show_names".
Render the heatmap with "pheatmap::pheatmap()", passing in custom distance, color, clustering, and "display" number settings, saving to a temporary file to suppress autosave.
A "pheatmap" object representing the correlation heatmap with clustering.
Create an interactive correlation heatmap of top variable genes using Heatmaply.
CorrPlotHeatmaply(x, Color, type, cluster, scale, show_names, NumGenes, main)CorrPlotHeatmaply(x, Color, type, cluster, scale, show_names, NumGenes, main)
x |
Numeric matrix of log-CPM values (genes × samples), e.g., from "edgeR::cpm()". |
Color |
Character. Name of a continuous palette from the "paletteer" package. |
type |
Character. Correlation method passed to "Hmisc::rcorr()": "pearson", "spearman", or "kendall". |
cluster |
Character or logical. Clustering option for dendrogram: "both", "row", "column", or "none". |
scale |
Character. Scaling mode for the heatmap: "row", "column", or "none". |
show_names |
Character. One of "both", "row", "column", or "none" to display row/column labels. |
NumGenes |
Integer. Number of top-variance genes to include in the correlation. |
main |
Character. The plot title |
This function selects the highest-variance genes from a log-CPM matrix, computes pairwise correlation coefficients (and p-values) with "Hmisc::rcorr()", and renders an interactive correlation heatmap via "heatmaply::heatmaply_cor()", using clustering and scaling options derived from "pheatmap" call.
Compute per-gene variance and select the top "NumGenes".
Subset the matrix and compute correlations (and p-values) via "Hmisc::rcorr()".
Generate a temporary static heatmap with "pheatmap" to extract dendrograms.
Render an interactive heatmap with "heatmaply::heatmaply_cor()", passing in color, clustering, scaling, tick-label visibility, and point size based on -log10(p-value).
A Plotly object (heatmaply) representing the interactive correlation heatmap.
COUNTING SEQUENCES
counting_Reads(input_data)counting_Reads(input_data)
input_data |
sample folder |
Server function for DEGs module in Shiny application
DEGsServerLogic(id)DEGsServerLogic(id)
id |
Shiny module identifier |
UI function for DEGs module in Shiny application
DEGsUserInterface(id)DEGsUserInterface(id)
id |
Shiny module identifier |
Server function for EDA module in Shiny application
EDAServerLogic(id)EDAServerLogic(id)
id |
Shiny module identifier |
UI function for EDA module in Shiny application
EDAUserInterface(id)EDAUserInterface(id)
id |
Shiny module identifier |
Perform differential expression analysis on RNA-seq count data using edgeR.
EdgerDEG( gr, WD_samples, WD_DEGs, colIDgene, colCounts, skip_preN, grContrast, filter, model, normMethod, min_count, min_total_count, large_n, min_prop, adjustPvalue, Th_logFC, Th_Pvalue )EdgerDEG( gr, WD_samples, WD_DEGs, colIDgene, colCounts, skip_preN, grContrast, filter, model, normMethod, min_count, min_total_count, large_n, min_prop, adjustPvalue, Th_logFC, Th_Pvalue )
gr |
Data frame. Sample metadata with columns Samples and Groups. |
WD_samples |
Character. Directory containing raw count .tab files. |
WD_DEGs |
Character. Directory in which to write results and logs. |
colIDgene |
Integer. Column index in each count file for gene IDs. |
colCounts |
Integer. Column index in each count file for raw counts. |
skip_preN |
Integer. Number of header lines to skip when reading count files. |
grContrast |
Data frame. Two-column table with Test and Baseline group names for contrasts. |
filter |
Character. Filtering method: "filterByExpr" or "HTSFilter". |
model |
Character. Statistical test: "exactTest", "glmQLFTest", or "glmLRT". |
normMethod |
Character. Normalization method for edgeR (e.g., "TMM", "RLE"). |
min_count |
Numeric. Minimum count per gene for "filterByExpr". |
min_total_count |
Numeric. Minimum total count per gene for "filterByExpr". |
large_n |
Integer. Sample size threshold for "filterByExpr". |
min_prop |
Numeric. Proportion threshold for "filterByExpr". |
adjustPvalue |
Character. P-value adjustment method (e.g., "fdr", "holm", "none"). |
Th_logFC |
Numeric. Absolute log-fold-change threshold to call differential expression. |
Th_Pvalue |
Numeric. Adjusted p-value threshold to call differential expression. |
This function reads raw count tables, applies expression filtering (via "filterByExpr" or "HTSFilter"), normalizes library sizes, estimates dispersion, fits statistical models ("exactTest", "glmQLFTest", or "glmLRT"), and writes per-contrast results and diagnostic plots.
Reads in per-sample count files and generate a DGEList.
Builds the design matrix and contrast definitions from "grContrast".
Filters lowly expressed genes, normalizes library sizes, and logs filtering summary.
Estimates dispersion (standard or quasi-likelihood).
Runs chosen differential test per contrast, annotates each gene as "UP", "DOWN", or "NO", and writes CSV output files named by filter, model, and contrast.
Captures and saves BCV and QL dispersion plots as SVGs in WD_DEGs.
A list invisibly returned containing any captured plots and log messages; primary results are written to CSV files in "WD_DEGs".
Filter paired-end FASTQ files in parallel based on quality and adapter trimming criteria.
Filtering( Nodes, X, UploadPath, DownloadPath, qualityType, minLen, trim, trimValue, n, Adapters, Lpattern, Rpattern, max.Lmismatch, max.Rmismatch, kW, left, right, halfwidthAnalysis, halfwidth, compress )Filtering( Nodes, X, UploadPath, DownloadPath, qualityType, minLen, trim, trimValue, n, Adapters, Lpattern, Rpattern, max.Lmismatch, max.Rmismatch, kW, left, right, halfwidthAnalysis, halfwidth, compress )
Nodes |
Integer. Number of parallel processing nodes (e.g., CPU cores). |
X |
List of character vectors. Each element is a character vector of paired file names (e.g., c("sample_1.fq", "sample_2.fq")). |
UploadPath |
Character. Path to directory containing raw FASTQ files. |
DownloadPath |
Character. Path to directory where filtered files will be saved. |
qualityType |
Character. Type of quality score encoding, e.g., "Sanger" or "Illumina". |
minLen |
Integer. Minimum length of reads to retain after filtering. |
trim |
Logical. Whether to perform quality-based trimming of reads. |
trimValue |
Integer. Minimum Phred score threshold for trimming. |
n |
Integer. Number of reads to stream per chunk (default typically set to 1e6). |
Adapters |
Logical. Whether to remove adapters from reads. |
Lpattern |
Character. Adapter sequence to remove from the 5' end (left). |
Rpattern |
Character. Adapter sequence to remove from the 3' end (right). |
max.Lmismatch |
Integer. Maximum mismatches allowed for the left adapter. |
max.Rmismatch |
Integer. Maximum mismatches allowed for the right adapter. |
kW |
Integer. Minimum number of low-quality scores in a window to trigger trimming (sliding window analysis). |
left |
Logical. Whether to allow trimming from the left end. |
right |
Logical. Whether to allow trimming from the right end. |
halfwidthAnalysis |
Logical. Whether to perform sliding window-based trimming. |
halfwidth |
Integer. Half-width of the sliding window. |
compress |
Logical. Whether to compress the output FASTQ files. |
This function processes raw paired-end FASTQ files to remove low-quality bases, trim adapters, and filter out short reads. It supports quality-based end trimming, sliding window trimming, and adapter removal. The processing is done in parallel across multiple nodes to enhance performance when working with large datasets.
Paired FASTQ files must be named consistently, distinguished by "_1" and "_2" for forward and reverse reads.
This function uses the "ShortRead" and "Biostrings" packages for FASTQ processing and quality filtering.
Filtered files in FASTQ format".
Log files containing read counts before and after filtering are written per sample.
Filtered FASTQ files written to "DownloadPath"; one log file per sample.
Server function for filtering module in Shiny application
FilteringServerLogic(id)FilteringServerLogic(id)
id |
Shiny module identifier |
UI function for filtering module in Shiny application
FilteringUserInterface(id)FilteringUserInterface(id)
id |
Shiny module identifier |
GCcontentDistributionPlot
GCcontentDistributionPlot(input_data)GCcontentDistributionPlot(input_data)
input_data |
samples folder |
interactive GCcontentDistributionPlot
GCcontentDistributionPlotly(input_data)GCcontentDistributionPlotly(input_data)
input_data |
samples folder |
Merge multiple DEG result CSVs with GTF annotations into a single data frame.
getDegMerged(path, gtfPath, columns, collapseName, typeFilter, selectUpDown)getDegMerged(path, gtfPath, columns, collapseName, typeFilter, selectUpDown)
path |
Character. Directory containing DEG result CSV files. |
gtfPath |
Character. Path to the GTF annotation file. |
columns |
Character vector. Names of annotation columns to include from the GTF. |
collapseName |
Logical. If TRUE, strip method/model prefixes from file names when prefixing columns. |
typeFilter |
Character. GTF feature type to filter (e.g., "gene" or "transcript"). |
selectUpDown |
Logical. If TRUE, only include IDs with "diffExp" == UP or DOWN. |
This function reads all CSV files in a directory, validates presence of required columns ("ID", and optionally "diffExp"), filters for up/down regulated genes if requested, extracts annotation fields from a GTF, and returns a merged table of selected annotation columns alongside all DEG metrics (with optional file-based column prefixes).
A combined data frame
Calculate and return filtered DGEList object and log-CPM matrices using edgeR and optional HTSFilter
GetEdgerY( gr, WDpn, colIDgene, colCounts, skip_preN, filterMethod, min_count, min_total_count, large_n, min_prop, normMethod )GetEdgerY( gr, WDpn, colIDgene, colCounts, skip_preN, filterMethod, min_count, min_total_count, large_n, min_prop, normMethod )
gr |
Data frame with sample metadata, including sample names and group labels |
WDpn |
Directory containing count files (*.tab) |
colIDgene |
Column index of gene IDs in count files |
colCounts |
Column index of counts in count files |
skip_preN |
Number of header lines to skip in count files |
filterMethod |
Either "filterByExpr" or "HTSFilter" |
min_count |
Minimum count per gene (filterByExpr) |
min_total_count |
Minimum total count per gene (filterByExpr) |
large_n |
Number of samples per group to consider as "large" (filterByExpr) |
min_prop |
Minimum proportion of samples with expression (filterByExpr) |
normMethod |
Normalization method (e.g., "TMM", "RLE") |
A list with total/kept gene counts, filtered DGEList objects, and log-CPM matrices
Plot a heatmap of the top variable genes across samples.
HeatmapExp( x, ColorPanel, scale, cutree_rows, cutree_cols, cluster, show_names, NumGenes, main )HeatmapExp( x, ColorPanel, scale, cutree_rows, cutree_cols, cluster, show_names, NumGenes, main )
x |
Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm(). |
ColorPanel |
Character. Name of a continuous palette from the paletteer package. |
scale |
Character. Scaling mode for heatmap: "row", "column", or "none". |
cutree_rows |
Integer. Number of clusters for rows (genes). |
cutree_cols |
Integer. Number of clusters for columns (samples). |
cluster |
Character. One of "both", "row", "column", or "none" to specify clustering. |
show_names |
Character. One of "both", "row", "column", or "none" to show row/col names. |
NumGenes |
Integer. Number of top-variance genes to include in the heatmap. |
main |
Character. The plot title |
This function selects the highest-variance genes from a log-CPM matrix, transposes the data, and renders a heatmap with customizable clustering, scaling, and color palettes using pheatmap.
Compute per-gene variance and select the top "NumGenes".
Transpose the subsetted matrix so samples are rows.
Apply the specified color palette (n = 50) via paletteer::paletteer_c().
Determine clustering and name-display options from "cluster" and "show_names".
Render the heatmap with "pheatmap::pheatmap()", saving to a temporary file to suppress autosave.
A "pheatmap" object containing the heatmap and clustering information.
Create an interactive heatmap of top variable genes using Heatmaply.
HeatmapExpPlotly(x, ColorPanel, scale, cluster, show_names, NumGenes, main)HeatmapExpPlotly(x, ColorPanel, scale, cluster, show_names, NumGenes, main)
x |
Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm(). |
ColorPanel |
Character. Name of a continuous palette from the paletteer package. |
scale |
Character. Scaling mode: "row", "column", or "none". |
cluster |
Character or logical. Clustering option for dendrogram: "both", "row", "column", or "none". |
show_names |
Character. One of "both", "row", "column", or "none" to display row/column labels. |
NumGenes |
Integer. Number of top-variance genes to include in the heatmap. |
main |
Character. The plot title |
This function selects the highest-variance genes from a log-CPM matrix, transposes the data, and renders an interactive heatmap via "heatmaply", using "pheatmap" call.
Compute per-gene variance and select the top NumGenes.
Transpose the subsetted matrix so samples are rows.
Generate a temporary static heatmap with pheatmap to extract dendrograms.
Render an interactive heatmap with heatmaply::heatmaply().
A Plotly object (heatmaply) representing the interactive heatmap.
A Shiny app for dual and bulk RNA‑sequencing analysis.
inDAGO()inDAGO()
This function allows to launch inDAGO Shiny interface.
No return value, called for side effects
Bulk indexing
IndexingBulk(basename, reference, gappedIndex, indexSplit, memory, TH_subread)IndexingBulk(basename, reference, gappedIndex, indexSplit, memory, TH_subread)
basename |
output basename |
reference |
reference genome |
gappedIndex |
gapped structure |
indexSplit |
split structure |
memory |
handling memory |
TH_subread |
threshold memory usage |
Indexing bulk server logic
IndexingBulkServerLogic(id)IndexingBulkServerLogic(id)
id |
Shiny module identifier |
Indexing bulk ui
IndexingBulkUserInterface(id)IndexingBulkUserInterface(id)
id |
Shiny module identifier |
Combined indexing
IndexingComb( basename, reference, gappedIndex, indexSplit, memory, TH_subread, gen1, gen2, outfolder, tempfolder = file.path(fs::path_temp(), "TempDirSum_3738"), tag1, tag2 )IndexingComb( basename, reference, gappedIndex, indexSplit, memory, TH_subread, gen1, gen2, outfolder, tempfolder = file.path(fs::path_temp(), "TempDirSum_3738"), tag1, tag2 )
basename |
output basename |
reference |
reference genome |
gappedIndex |
gapped structure |
indexSplit |
split structure |
memory |
handling memory |
TH_subread |
threshold memory usage |
gen1 |
first reference genome |
gen2 |
second reference genome |
outfolder |
output folder |
tempfolder |
temporary folder |
tag1 |
first genome label |
tag2 |
second genome label |
Indexing combined server logic
IndexingCombinedServerLogic(id)IndexingCombinedServerLogic(id)
id |
Shiny module identifier |
Indexing combined ui
IndexingCombinedUserInterface(id)IndexingCombinedUserInterface(id)
id |
Shiny module identifier |
Indexing sequential parallel
IndexingSequentialParallel( basename, reference, gappedIndex, indexSplit, memory, TH_subread )IndexingSequentialParallel( basename, reference, gappedIndex, indexSplit, memory, TH_subread )
basename |
output basename |
reference |
reference genome |
gappedIndex |
gapped structure |
indexSplit |
split structure |
memory |
handling memory |
TH_subread |
threshold memory usage |
Indexing sequential progressive
IndexingSequentialProgressive( outfolder1, outfolder2, refgen1, refgen2, gappedIndex, indexSplit, memory, TH_subread )IndexingSequentialProgressive( outfolder1, outfolder2, refgen1, refgen2, gappedIndex, indexSplit, memory, TH_subread )
outfolder1 |
first output folder |
outfolder2 |
second output folder |
refgen1 |
first reference genome |
refgen2 |
second reference genome |
gappedIndex |
gapped structure |
indexSplit |
split structure |
memory |
handling memory |
TH_subread |
threshold memory usage |
Indexing sequential server logic
IndexingSequentialServerLogic(id)IndexingSequentialServerLogic(id)
id |
Shiny module identifier |
Indexing sequential ui
IndexingSequentialUserInterface(id)IndexingSequentialUserInterface(id)
id |
Shiny module identifier |
Mapping bulk server logic
mappingBulkServerLogic(id)mappingBulkServerLogic(id)
id |
Shiny module identifier |
Mapping bulk ui
mappingBulkUserInterface(id)mappingBulkUserInterface(id)
id |
Shiny module identifier |
Mapping combined server logic
mappingCombinedServerLogic(id)mappingCombinedServerLogic(id)
id |
Shiny module identifier |
Mapping combined ui
mappingCombinedUserInterface(id)mappingCombinedUserInterface(id)
id |
Shiny module identifier |
Mapping sequential server logic
mappingSequentialServerLogic(id)mappingSequentialServerLogic(id)
id |
Shiny module identifier |
Mapping sequential ui
mappingSequentialUserInterface(id)mappingSequentialUserInterface(id)
id |
Shiny module identifier |
Compute MDS coordinates for expression data using limma's plotMDS.
mdsinfo(matrix, top, gene.selection)mdsinfo(matrix, top, gene.selection)
matrix |
A DGEList object. |
top |
Integer. Number of top most variable genes to include in MDS. |
gene.selection |
Method for gene selection: one of "pairwise", "common", or "logFC". |
This function performs multidimensional scaling (MDS) on a DGEList or log-expression matrix using limma's "plotMDS() " function. It returns the MDS object containing coordinates and eigenvalues without generating a plot.
A list object from "plotMDS() " containing MDS coordinates and eigenvalues.
Generate a multidimensional scaling (MDS) plot based on expression data.
mdsPlot( x, Sample, Group, title, palette, maxOverlaps, sizeLabel, top, gene.selection )mdsPlot( x, Sample, Group, title, palette, maxOverlaps, sizeLabel, top, gene.selection )
x |
DGEList object from edgeR. |
Sample |
A character vector of sample labels (one per column in "x "). |
Group |
A factor or character vector specifying the group/class of each sample. |
title |
Plot title as a character string. |
palette |
Name of a palette from the "paletteer " package for coloring groups. |
maxOverlaps |
Maximum number of overlapping labels allowed by "geom_text_repel ". |
sizeLabel |
Numeric value for label font size. |
top |
Integer. Number of top most variable genes to include in MDS. |
gene.selection |
Method for gene selection: one of "pairwise", "common", or "logFC". |
This function performs MDS analysis using limma's "plotMDS() " and visualizes the sample relationships in two dimensions using "ggplot2 " and "ggrepel ".
A "ggplot " object representing the MDS plot.
Generate an interactive MDS plot using Plotly based on expression data.
mdsPlottly(x, Sample, Group, title, palette, top, gene.selection)mdsPlottly(x, Sample, Group, title, palette, top, gene.selection)
x |
A DGEList object from edgeR. |
Sample |
Character vector. Sample names corresponding to columns of "x ". |
Group |
Factor or character vector. Group or condition for each sample. |
title |
Character. Title for the plot. |
palette |
Character. Name of a discrete palette from the "paletteer " package. |
top |
Integer. Number of top most variable genes (by logFC) to include in MDS. |
gene.selection |
Character. Gene selection method: one of ""pairwise" ", ""common" ", or ""logFC" ". |
This function computes multidimensional scaling (MDS) coordinates with limma's "plotMDS() " and then renders an interactive scatterplot via "plotly::ggplotly() ".
Compute MDS on the input data with "limma::plotMDS() ".
Extract eigenvalues and first two dimensions for variance annotation.
Build a ggplot2 scatterplot with axis labels showing percent variance explained.
Convert the ggplot to an interactive Plotly graph.
A Plotly object ( "plotly::ggplotly ") representing the interactive MDS scatterplot.
Perform Principal Component Analysis (PCA) on log-expression data.
pcainfo(logcounts, center, scale)pcainfo(logcounts, center, scale)
logcounts |
Numeric matrix. Log-CPM values (genes × samples), e.g., from edgeR::cpm.. |
center |
Logical. If TRUE, center variables by subtracting the mean (default: TRUE). |
scale |
Logical. If TRUE, scale variables to unit variance (default: FALSE). |
This function transposes a log-count matrix (samples as columns, genes as rows) and runs PCA using "stats::prcomp() ", with options to center and scale variables.
An object of class "prcomp " containing the PCA results, including loadings, scores, and explained variance.
Create a PCA scatter plot from log-expression data with sample labels.
pcaPlot( logcounts, Sample, Group, title, palette, maxOverlaps, sizeLabel, center, scale )pcaPlot( logcounts, Sample, Group, title, palette, maxOverlaps, sizeLabel, center, scale )
logcounts |
Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm. |
Sample |
Character vector of sample names corresponding to the columns of "logcounts". |
Group |
Factor or character vector denoting group/condition for each sample. |
title |
Character. Title for the PCA plot. |
palette |
Character. Name of a discrete color palette from the "paletteer" package. |
maxOverlaps |
Integer. Maximum number of overlapping labels allowed by "ggrepel". |
sizeLabel |
Numeric. Font size for sample labels. |
center |
Logical. If TRUE, center variables before PCA. |
scale |
Logical. If TRUE, scale variables to unit variance before PCA. |
This function performs Principal Component Analysis (PCA) on a log-count matrix and visualizes the first two principal components using ggplot2 and ggrepel. Each point represents a sample, colored by group, with hover labels.
Transposes the "logcounts" matrix so samples are rows.
Runs PCA via "stats::prcomp()" with centering and scaling options.
Calculates percent variance explained by PC1 and PC2.
Builds a scatter plot with black‐bordered points and non‐overlapping labels.
A "ggplot" object displaying the PCA scatter plot of PC1 vs PC2.
Create an interactive PCA scatter plot using Plotly from log-expression data.
pcaPlottly(logcounts, Sample, Group, title, palette, center, scale)pcaPlottly(logcounts, Sample, Group, title, palette, center, scale)
logcounts |
Numeric matrix of log-CPM values (genes × samples), e.g., from edgeR::cpm. |
Sample |
Character vector of sample names corresponding to the columns of "logcounts ". |
Group |
Factor or character vector denoting group/condition for each sample. |
title |
Character. Title for the PCA plot. |
palette |
Character. Name of a discrete color palette from the "paletteer" package. |
center |
Logical. If TRUE, center variables (genes) before PCA. |
scale |
Logical. If TRUE, scale variables to unit variance before PCA. |
This function performs Principal Component Analysis (PCA) on a log-count matrix and generates an interactive plot of the first two principal components via "plotly::ggplotly()".
Transposes the "logcounts " matrix so samples are rows.
Runs PCA with "stats::prcomp() ", using centering and scaling as specified.
Computes percent variance explained by PC1 and PC2.
Builds a ggplot2 scatterplot and converts it to an interactive Plotly graph.
A Plotly object ( "plotly::ggplotly ") representing the interactive PCA scatterplot.
QUALITY CONTROL ANALYSIS
QualityCheckAnalysis( directoryInput, inputFormat, Nodes, ReadsNumber, directoryOutput, tempFolder )QualityCheckAnalysis( directoryInput, inputFormat, Nodes, ReadsNumber, directoryOutput, tempFolder )
directoryInput |
sample directory |
inputFormat |
raw read format |
Nodes |
cores |
ReadsNumber |
chunk |
directoryOutput |
output folder |
tempFolder |
temporary folder |
Quality control server logic
qualityControlServerLogic(id)qualityControlServerLogic(id)
id |
Shiny module identifier |
Quality control ui
qualityControlUserInterface(id)qualityControlUserInterface(id)
id |
Shiny module identifier |
Generate a saturation curve plot showing gene detection versus sequencing depth.
Saturation(matrix, method, max_reads, palette)Saturation(matrix, method, max_reads, palette)
matrix |
Numeric matrix or object coercible to matrix (genes × samples), e.g., log-counts or raw counts. Genes are rows; samples are columns. |
method |
Character. Estimation method: "division" or "sampling". |
max_reads |
Numeric. Maximum number of reads to include in the rarefaction (default: Inf). |
palette |
Character. Name of a discrete color palette from the "paletteer " package for curve colors. |
This function estimates how many genes are detected at increasing read depths using a rarefaction-based approach ( "estimate_saturation() from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git"), and plots the saturation curves for each sample. It supports two estimation methods: “division” for a fast analytic approximation and “sampling” for more realistic approach.
Internally, "extract_counts() " (from countSubsetNorm) extracts a counts matrix from various input classes (matrix, DGEList, EList, ExpressionSet).
"estimate_saturation() " (from RNAseQC package https://github.com/BenaroyaResearch/RNAseQC.git) rarefies each library at multiple depths:
“division” divides counts by scale factors;
“sampling” performs repeated random sampling to simulate read down sampling.
The resulting data frame contains one row per sample per depth, with the number of detected genes ( "sat ") and, for sampling, its variance ( "sat.var ").
The function then plots gene saturation curves ( "sat" vs. "depth") colored by sample.
Extract counts matrix from different types of expression objects
Estimate saturation of genes based on rarefaction of reads
A "ggplot " object showing saturation (genes detected) versus sequencing depth for each sample.
SequenceLengthDistributionPlot
SequenceLengthDistributionPlot(input_data)SequenceLengthDistributionPlot(input_data)
input_data |
result tables folder |
interactive SequenceLengthDistributionPlot
SequenceLengthDistributionPlotly(input_data)SequenceLengthDistributionPlotly(input_data)
input_data |
result tables folder |
Sequential alignment function
SequentialAlignment( lalista, nodes, readsPath, GenomeFirstIndex, GenomeSecondIndex, outBam1, outBam2, threads, phredScore, maxExtractedSubreads, consensusVote, mismatchMax, maxMultiMapped, indelLength, fragmentMinLength, fragmentMaxLength, matesOrientation, readOrderConserved, coordinatesSorting, allJunctions, tempfolder, readsAlignedBlock )SequentialAlignment( lalista, nodes, readsPath, GenomeFirstIndex, GenomeSecondIndex, outBam1, outBam2, threads, phredScore, maxExtractedSubreads, consensusVote, mismatchMax, maxMultiMapped, indelLength, fragmentMinLength, fragmentMaxLength, matesOrientation, readOrderConserved, coordinatesSorting, allJunctions, tempfolder, readsAlignedBlock )
lalista |
list of samples |
nodes |
logic cores |
readsPath |
sample folders |
GenomeFirstIndex |
first genome index |
GenomeSecondIndex |
second genome index |
outBam1 |
first output folder |
outBam2 |
second output folder |
threads |
processes |
phredScore |
quality score |
maxExtractedSubreads |
number of subreads |
consensusVote |
consensus |
mismatchMax |
mismatch |
maxMultiMapped |
multimapping |
indelLength |
indel |
fragmentMinLength |
fragment minumum length |
fragmentMaxLength |
fragment maximum length |
matesOrientation |
mate orientation |
readOrderConserved |
read order |
coordinatesSorting |
sorting |
allJunctions |
junctions |
tempfolder |
temporary folder |
readsAlignedBlock |
chunks |
Summarizes read counts from multiple BAM/SAM files in parallel using feature annotations.
Summarization( NodesSum, Xsum, UploadPathSum, DownloadPathSum, annot.ext, isGTFAnnotationFile, GTF.featureType, GTF.attrType, useMetaFeatures, allowMultiOverlap, minOverlap, fracOverlap, fracOverlapFeature, largestOverlap, countMultiMappingReads, fraction, minMQS, primaryOnly, ignoreDup, strandSpecific, requireBothEndsMapped, checkFragLength, minFragLength, maxFragLength, countChimericFragments, autosort, nthreads, tmpDir, verbose )Summarization( NodesSum, Xsum, UploadPathSum, DownloadPathSum, annot.ext, isGTFAnnotationFile, GTF.featureType, GTF.attrType, useMetaFeatures, allowMultiOverlap, minOverlap, fracOverlap, fracOverlapFeature, largestOverlap, countMultiMappingReads, fraction, minMQS, primaryOnly, ignoreDup, strandSpecific, requireBothEndsMapped, checkFragLength, minFragLength, maxFragLength, countChimericFragments, autosort, nthreads, tmpDir, verbose )
NodesSum |
Integer. Number of parallel R nodes (e.g., CPU cores) to spawn. |
Xsum |
Character vector. Filenames of BAM or SAM files to process. |
UploadPathSum |
Character. Directory containing the raw input files. |
DownloadPathSum |
Character. Directory into which all output files will be written. |
annot.ext |
Character. Path to an external annotation file (e.g., GTF/GFF). |
isGTFAnnotationFile |
Logical. Should |
GTF.featureType |
Character. Feature type (e.g., "exon"). |
GTF.attrType |
Character. GTF attribute (e.g., "gene_id"). |
useMetaFeatures |
Logical. Collapse sub-features into meta-features before counting. |
allowMultiOverlap |
Logical. Allow reads overlapping multiple features to be counted. |
minOverlap |
Integer. Minimum number of overlapping bases to assign a read. |
fracOverlap |
Numeric. Minimum fraction of read that must overlap a feature. |
fracOverlapFeature |
Numeric. Minimum fraction of feature that must be covered by a read. |
largestOverlap |
Logical. When overlapping multiple features, assign based on largest overlap. |
countMultiMappingReads |
Logical. Count reads that map to multiple locations. |
fraction |
Logical. Distribute counts fractionally for multi-mapping reads. |
minMQS |
Integer. Minimum mapping quality score for reads to be counted. |
primaryOnly |
Logical. Count only the primary alignments of multi-mapping reads. |
ignoreDup |
Logical. Exclude PCR duplicates from counting. |
strandSpecific |
Integer. Strand-specific counting mode (0 = unstranded, 1 = stranded, 2 = reversely stranded). |
requireBothEndsMapped |
Logical. In paired-end mode, require both mates to map. |
checkFragLength |
Logical. Enforce fragment length checks on paired-end reads. |
minFragLength |
Numeric. Minimum fragment length to keep. |
maxFragLength |
Numeric. Maximum fragment length to keep. |
countChimericFragments |
Logical. Count discordant or chimeric read pairs. |
autosort |
Logical. Automatically sort input files if not already sorted. |
nthreads |
Integer. Number of threads per featureCounts call. |
tmpDir |
Character. Directory for temporary files (e.g., large intermediate files). |
verbose |
Logical. Print verbose messages during execution. |
This function run Rsubread::featureCounts() on each input file,
capturing count statistics, annotation data, and per-sample summary logs. Results are
written to the specified output directory.
A socket cluster of NodesSum workers is created.
Each worker invokes featureCounts() on one sample, using the annotation and counting parameters.
Outputs per sample:
A text summary (*_summary.txt) capturing the console output.
A CSV of count statistics (*_stat.csv).
A CSV of feature annotations (*_annotation.csv).
A tab-delimited count matrix saved under Counts/<sample>.tab.
The cluster is terminated once all samples complete.
Writes files to DownloadPathSum.
Server function for Summarization module in Shiny application
SummarizationServerLogic(id)SummarizationServerLogic(id)
id |
Shiny module identifier |
UI function for Summarization module in Shiny application
SummarizationUserInterface(id)SummarizationUserInterface(id)
id |
Shiny module identifier |
Create an interactive UpSet plot of overlapping DEGs using "UpsetJS".
UpsetjsPlot( WD_samples, Th_logFC, Th_Pvalue, collapseName, nintersects, st_significance )UpsetjsPlot( WD_samples, Th_logFC, Th_Pvalue, collapseName, nintersects, st_significance )
WD_samples |
Character. Directory containing DEG result CSV files. |
Th_logFC |
Numeric. Absolute log2 fold-change threshold to include a gene. |
Th_Pvalue |
Numeric. P-value threshold for significance (0 < Th_Pvalue <= 1). |
collapseName |
Logical. If TRUE, strip method/model prefixes from file names when labeling sets. |
nintersects |
Integer. Maximum number of intersections to display. |
st_significance |
Character. Which p-value to use: "adjustPvalue" (FDR or FWER) or "PValue". |
This function reads DEG CSV files from a directory, filters genes by log-FC and p-value thresholds (adjusted or raw), optionally simplifies file names, and visualizes the intersections of gene sets using the "UpsetJS" package.
Lists all CSV files in "WD_samples" and reads each into a data frame.
Checks for duplicate IDs and selects "ID", "logFC", and either "adjustPvalue" or "PValue".
Filters each set by "|logFC| >= Th_logFC" and p-value < "Th_Pvalue".
Renames each gene-ID list to the (optionally collapsed) file name.
Feeds the list of gene sets into "upsetjs::upsetjs()"
An interactive "UpsetJS" object.
Generate an UpSet plot of overlapping DEGs across multiple contrasts.
UpSetPlot( WD_samples, Th_logFC, Th_Pvalue, collapseName, nintersects, st_significance, scale )UpSetPlot( WD_samples, Th_logFC, Th_Pvalue, collapseName, nintersects, st_significance, scale )
WD_samples |
Character. Directory containing DEG result CSV files. |
Th_logFC |
Numeric. Absolute log2 fold-change threshold to include a gene. |
Th_Pvalue |
Numeric. P-value threshold for significance (0 < Th_Pvalue <= 1). |
collapseName |
Logical. If TRUE, strip method/model prefixes from file names when labeling sets. |
nintersects |
Integer. Maximum number of intersections to display. |
st_significance |
Character. Which p-value to use: "adjustPvalue" (FDR or FWER) or "PValue". |
scale |
Numeric. Text scaling factor for plot labels and annotations. |
This function reads DEG CSV files from a directory, filters genes by log-FC and p-value thresholds (adjusted or raw), optionally simplifies file names, and visualizes the intersections of gene sets using an UpSet plot.
Validates thresholds (Th_logFC >= 0, 0 < Th_Pvalue <= 1).
Lists all CSV files in WD_samples and reads each into a data frame.
Checks for duplicate IDs and standardizes to columns ID, logFC, and adjustPvalue or PValue.
Filters each set of results by |logFC| >= Th_logFC and p-value < Th_Pvalue.
Renames each gene-ID column to the (optionally collapsed) file name.
Converts the list of filtered ID sets to an UpSetR input and calls UpSetR::upset().
An UpSet plot.
Create a volcano plot of differential expression results.
volcanoPlot( x, palettePoint, maxOverlaps, sizeLabel, Th_logFC, Th_Pvalue, subsetGenes, st_significance )volcanoPlot( x, palettePoint, maxOverlaps, sizeLabel, Th_logFC, Th_Pvalue, subsetGenes, st_significance )
x |
Character. File path to a CSV containing DEG results, with at least columns "ID", "logFC", and one of "PValue", "FDR", or "FWER". |
palettePoint |
Character. Name of a discrete palette from the "paletteer" package, supplying colors for "UP", "DOWN", and "NO". |
maxOverlaps |
Integer. Maximum allowed label overlaps passed to "ggrepel::geom_text_repel()". |
sizeLabel |
Numeric. Font size for gene labels in the plot. |
Th_logFC |
Numeric. Absolute log2 fold-change threshold to call a gene "UP" or "DOWN". |
Th_Pvalue |
Numeric. P-value threshold to call significance (uses "FDR"/"FWER" if "st_significance = "adjustPvalue"", otherwise raw "PValue"). |
subsetGenes |
Integer or "Inf". If numeric, only the top "subsetGenes" genes by p-value are shown and labeled. |
st_significance |
Character. Which p-value column to use: "adjustPvalue" (FDR or FWER) or "PValue". |
This function reads a CSV of DEGs, classifies genes as up/down/no change based on log-fold change and p-value thresholds, and plots –log10(p-value) versus log-FC using ggplot2.
Reads the input CSV and checks for duplicate IDs.
Standardizes columns to "ID", "logFC", and "adjustPvalue" or "PValue".
Optionally subsets to the top N genes by p-value.
Classifies each gene as "UP", "DOWN", or "NO" based on thresholds.
Plots points with manual fill, size, and alpha scales, adds threshold lines, and repels labels using "ggrepel".
A "ggplot" object displaying the volcano plot.
Create an interactive volcano plot of differential expression results using "Plotly".
volcanoPlottly( x, palettePoint, Th_logFC, Th_Pvalue, subsetGenes, st_significance )volcanoPlottly( x, palettePoint, Th_logFC, Th_Pvalue, subsetGenes, st_significance )
x |
Character. File path to a CSV containing DEG results, with at least columns "ID", "logFC", and one of "PValue", "FDR", or "FWER". |
palettePoint |
Character. Name of a discrete palette from the "paletteer" package, supplying colors for "UP", "DOWN", and "NO". |
Th_logFC |
Numeric. Absolute log2 fold-change threshold to call a gene "UP" or "DOWN". |
Th_Pvalue |
Numeric. P-value threshold to call significance (uses "FDR"/"FWER" if "st_significance = "adjustPvalue"", otherwise raw "PValue"). |
subsetGenes |
Integer or "Inf". If numeric, only the top "subsetGenes" genes by p-value are included in the plot. |
st_significance |
Character. Which p-value column to use: "adjustPvalue" (FDR or FWER) or "PValue". |
This function reads a CSV of DEGs, classifies genes as up/down/no change based on log-fold change and p-value thresholds, and renders an interactive volcano plot via "plotly::ggplotly()".
Reads the input CSV and checks for duplicate IDs.
Standardizes columns to "ID", "logFC", and "adjustPvalue" or "PValue".
Optionally subsets to the top N genes by p-value.
Classifies each gene as "UP", "DOWN", or "NO" based on thresholds.
Plots points with manual fill, size, and alpha scales, adds threshold lines, and converts to an interactive Plotly graph.
A Plotly object ("plotly::ggplotly") representing the interactive volcano plot.
Server function for workflow module in Shiny application
WorkflowServerLogic(id)WorkflowServerLogic(id)
id |
Shiny module identifier |
UI function for workflow module in Shiny application
WorkflowUserInterface(id)WorkflowUserInterface(id)
id |
Shiny module identifier |