Reputation: 155
I have the following to sort my fastq by their sequence identifiers:
zcat 001_T1_1.fastq.gz | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" | gzip -c > 001_T1_1_sorted.fastq.gz
zcat 001_T1_2.fastq.gz | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" | gzip -c > 001_T1_2_sorted.fastq.gz
It is a bit slow when i am trying it for one sample. Can we make it faster and run for all fastq.gz in a directory? How can i make it with bash ?
Upvotes: 0
Views: 261
Reputation: 26471
I think this might be a bit faster as we remove two executables from the pipe-line:
zcat file_in.gz |
awk 'BEGIN{PROCINFO["sorted_in"]="@val_str_asc"}
{n=int((NR-1)/4); a[n] = a[n] $0 ORS }
END { for(i in a) printf "%s",a[i] }' - | gzip -c - > file_out.gz
Upvotes: 1