user3683485
user3683485

Reputation: 155

Use cat to sort multiple fastq files in a directory

I have the following to sort my fastq by their sequence identifiers:

zcat 001_T1_1.fastq.gz | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" | gzip -c > 001_T1_1_sorted.fastq.gz
zcat 001_T1_2.fastq.gz | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" | gzip -c > 001_T1_2_sorted.fastq.gz

It is a bit slow when i am trying it for one sample. Can we make it faster and run for all fastq.gz in a directory? How can i make it with bash ?

Upvotes: 0

Views: 261

Answers (1)

kvantour
kvantour

Reputation: 26471

I think this might be a bit faster as we remove two executables from the pipe-line:

zcat file_in.gz | 
awk 'BEGIN{PROCINFO["sorted_in"]="@val_str_asc"}
     {n=int((NR-1)/4); a[n] = a[n] $0 ORS }
     END { for(i in a) printf "%s",a[i] }' - | gzip -c - > file_out.gz

Upvotes: 1

Related Questions