FASTQ format basics

Each read is in 4 lines:

  1. Identifier line – starts with @ and gives a unique read ID, usually with flowcell/lane/tile info.
  2. Sequence line – the actual base calls (A, T, G, C, sometimes N for unknown).
  3. Separator line – just + (sometimes repeats the ID, but here it doesn’t).
  4. Quality line – ASCII characters encoding Phred quality scores (one character per base).

HBR_1_R1.fq

@HWI-ST718_146963544:7:2201:16660:89809/1
CAAAGAGAGAAAGAAAAGTCAATGATTTTATAGCCAGGCAAAATGACTTTCAAGTAAAAAATATAAAGCACCTTACAAACTAGTATCAAAATGCATTTCT
+
CCCFFFFFHHHHHJJJJJHIHIJJIJJJJJJJJJJJJIJJJJJJJJJJJJJIJJIIJJJJJJJJJJJJIIJFHHHEFFFFFEEEEEEEDDDDCDDEEDEE

fastQC Report Analysis

HBR_1_R1_fastqc.html

Basic Statistics

image.png

Per base sequence quality

in your FastQC report shows the quality score for each base position across all reads.

image.png