About this data set

No gene selected

Loading...

Data was generated in two separate experiments (L327 and L354). PBMCs from 4 (L327) or 6 (L354) healthy blood donors were sorted according to the indicated NK cell subsets, and RNA sequencing was performed using single-cell tagged reverse transcription (STRT).

Read quality filtering and barcode extraction

Every read marked as valid by the Illumina control software was processed as follows:

  1. any 3' bases with a quality score of 'B' were removed
  2. the 5' bases should be template switch derived Gs. If the first three were not all Gs, the read was discarded. Else, these plus a maximum of six more consecutive Gs were removed
  3. the remaining sequence should be transcript derived. If this ended in a poly(A)-sequence leaving less than 25 bases, the read was discarded
  4. if the transcript-derived sequence consisted of less than six non-A bases, or a dinucleotide repeat with less than six other bases at either end, the read was discarded.

Alignment

The reads were aligned to the genome using the Bowtie1 aligner, allowing for up to three mismatches and up to 24 alternative mappings for each read. The genome included an artifical chromosome, containing a concatamer of the ERCC spike-in control sequences. Any reads with no alignments were re-aligned against another artifical chromosome, containing all possible splice junctions arising from the exons defined by the known transcript variants. Reads mapping within these splice junctions were translated back to the corresponding actual genomic positions. The UCSC transcript models were used for the expression level calculation. If a locus had several transcript variants, the exons of these were merged to a combined model that represented all expression from the locus. To account for incomplete cap site knowledge, the 5' ends of all models were extended by 100 bases, but not beyond the 3' end of any upstream nearby exon of another gene of the same orientation.

Annotation and quantitation

Only reads mapping in sense orientation of an exon were considered as potentially transcript derived. Any such read that had one or more repeat mappings that was outside exons, was assigned randomly as one of these repeats and contributed to the summarized read count of that repeat class. Else, if it had one or more sense mappings to exons, it was assigned at the exon where it was closest to the transcript model 5' end, even if the sequence was repeat-like. If it had no exon mapping, it was assigned randomly at one of the mappings. The expression level of each transcript model was taken as the total number of reads in sense orientation at all its possible mapping positions.

Loading...