RJF
RJF

Reputation: 447

bash: zcat vs. cat discrepancy for concatenating .gz files

Here I've got a problem with my shell script. In my data analysis pipeline, I need to concatenate multiple gzipped files priore downstream analysis. These gzipped files come in pairs, so I need to concatenate all pair1 together and all pair2 together. My script for this looks like this:

for f in "${pair1_fqs[@]}"; do
    zcat "${f//\"/}" >> "$sampleID"_cat1.fq
done

for f in "${pair2_fqs[@]}"; do
    zcat "${f//\"/}" >> "$sampleID"_cat2.fq
done

the problem is zcat and cat returns different results:

zcat myfile.gz | wc -l
75896232
cat myfile.gz| wc -l
82322094

I was wondering if anyone here knows what could be the reason for this discrepancy!

Upvotes: 1

Views: 6107

Answers (1)

Farhad Farahi
Farhad Farahi

Reputation: 39507

zcat will uncompress first then pipe wc -l will counts the lines.

cat will just pass the data read from the file then pipe to wc -l will counts the lines.

Thats why you see different results, try cat on the compressed file, you will see gibberish.

Now try zcat on the compressed file, You will see your data.

Upvotes: 1

Related Questions