Unzip .gz file when line order is important

Question

I am trying to unzip fastq.gz files and then analyze the sequencing data within them. However, later analysis is dependent on preservation of line (line 1 from zipped file must be line 1 in unzipped file) in order within the unzipped files.

When I manually look at the files, it seems to me that line order is being preserved when using gunzip to unzip the fatsq.gz files (and I wouldn't expect anything else). However, downstream analysis fails because order has not been preserved from the original file. Am I missing something about the unzipping process?

It appears that something like the following is happening.

Sequencer writes data to fastq.txt:

line1
line2
line3
lin4

Then zips it into fastq.gz. I then unzip using gunzip and appear to get something like the following, where line order is disrupted:

line2
line1
line4
line3

Unzip .gz file when line order is important

Answers (1)

Related Questions