A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
Report generated on 2019-03-27, 17:38 based on data in:
/process/Processed_5bd9af0d6f44fa415a44efea.PE.R1
General Statistics
Showing 5/5 rows and 10/12 columns.| Sample Name | % Alignable | % Assigned | M Assigned | % Dups | % Aligned | M Aligned | % Trimmed | % Dups | % GC | M Seqs |
|---|---|---|---|---|---|---|---|---|---|---|
| 5bd9af0d6f44fa415a44efea.PE.R2 | 0.0% | |||||||||
| step1.1_trimed_5bd9af0d6f44fa415a44efea.PE.R1_1 | 72.2% | 49% | 27.0 | |||||||
| step1.1_trimed_5bd9af0d6f44fa415a44efea.PE.R1_2 | 70.7% | 50% | 27.0 | |||||||
| step2.1_Star_5bd9af0d6f44fa415a44efea.PE.R1 | 78.6% | 21.7 | 41.5% | 92.2% | 24.9 | |||||
| step4.4_Rsem_out_5bd9af0d6f44fa415a44efea.PE.R1 | 100.0% |
Preseq
Preseq estimates the complexity of a library, showing how many additional unique reads are sequenced for increasing total read count. A shallow curve indicates complexity saturation. The dashed line shows a perfectly complex library where total reads = unique reads.
Complexity curve
Note that the x axis is trimmed at the point where all the datasets show 80% of their maximum y-value, to avoid ridiculous scales.
Rsem
Rsem RSEM (RNA-Seq by Expectation-Maximization) is a software package forestimating gene and isoform expression levels from RNA-Seq data.
Mapped Reads
A breakdown of how all reads were aligned for each sample.
Multimapping rates
A frequency histogram showing how many reads were aligned to n reference regions.
In an ideal world, every sequence reads would align uniquely to a single location in the reference. However, due to factors such as repeititve sequences, short reads and sequencing errors, reads can be align to the reference 0, 1 or more times. This plot shows the frequency of each factor of multimapping. Good samples should have the majority of reads aligning once.
RSeQC
RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput RNA-seq data.
Read Distribution
Read Distribution calculates how mapped reads are distributed over genome features.
Gene Body Coverage
Gene Body Coverage calculates read coverage over gene bodies. This is used to check if reads coverage is uniform and if there is any 5' or 3' bias.
Inner Distance
Inner Distance calculates the inner distance (or insert size) between two paired RNA reads. Note that this can be negative if fragments overlap.
featureCounts
Subread featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.
Picard
Picard is a set of Java command line tools for manipulating high-throughput sequencing data.
Mark Duplicates
Cutadapt
Cutadapt is a tool to find and remove adapter sequences, primers, poly-Atails and other types of unwanted sequence from your high-throughput sequencing reads.
This plot shows the number of reads with certain lengths of adapter trimmed. Obs/Exp shows the raw counts divided by the number expected due to sequencing errors. A defined peak may be related to adapter length. See the cutadapt documentation for more information on how these numbers are generated.
FastQC
FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.
Sequence Quality Histograms
The mean quality value across each base position in the read. See the FastQC help.
Per Sequence Quality Scores
The number of reads with average quality scores. Shows if a subset of reads has poor quality. See the FastQC help.
Per Base Sequence Content
The proportion of each base position for which each of the four normal DNA bases has been called. See the FastQC help.
Rollover for sample name
Per Sequence GC Content
The average GC content of reads. Normal random library typically have a roughly normal distribution of GC content. See the FastQC help.
Per Base N Content
The percentage of base calls at each position for which an N was called. See the FastQC help.
Sequence Length Distribution
The distribution of fragment sizes (read lengths) found. See the FastQC help.
Sequence Duplication Levels
The relative level of duplication found for every sequence. See the FastQC help.
Overrepresented sequences
The total amount of overrepresented sequences found in each library. See the FastQC help for further information.
Adapter Content
The cumulative percentage count of the proportion of your library which has seen each of the adapter sequences at each position. See the FastQC help. Only samples with ≥ 0.1% adapter contamination are shown.