David Z
David Z

Reputation: 7051

Convert multiple files from bz2 to gz format

I have 575 bz2 files with average size 3G and need to convert them to .gz format to make them compatible with a downstream pipeline.

$ ll -h | head
total 1.4T
drwxrws---+ 1 dz33 dcistat  24K Aug 23 09:21 ./
drwxrws---+ 1 dz33 dcistat  446 Aug 22 11:57 ../
-rw-rw----  1 dz33 dcistat 2.0G Aug 22 11:38 DRR091550_1.fastq.bz2
-rw-rw----  1 dz33 dcistat 2.0G Aug 22 11:38 DRR091550_2.fastq.bz2
-rw-rw----  1 dz33 dcistat 2.0G Aug 22 11:38 DRR091551_1.fastq.bz2
-rw-rw----  1 dz33 dcistat 2.0G Aug 22 11:38 DRR091551_2.fastq.bz2
-rw-rw----  1 dz33 dcistat 1.9G Aug 22 11:38 DRR091552_1.fastq.bz2
-rw-rw----  1 dz33 dcistat 1.9G Aug 22 11:38 DRR091552_2.fastq.bz2
-rw-rw----  1 dz33 dcistat 1.8G Aug 22 11:38 DRR091553_1.fastq.bz2

$ ll | wc -l
575

For a single file I probably can do bzcat a.bz2 | gzip -c >a.gz, but I am wondering how to convert them entirely with one command or loop in bash/linux.

Upvotes: 5

Views: 6297

Answers (2)

Mark Setchell
Mark Setchell

Reputation: 207678

Do them simply and fast in parallel with GNU Parallel:

parallel --dry-run 'bzcat {} | gzip -c > {.}.gz' ::: *bz2

Sample Output

bzcat a.bz2 | gzip -c > a.gz
bzcat b.bz2 | gzip -c > b.gz
bzcat c.bz2 | gzip -c > c.gz

If you like how it looks, remove the --dry-run. Maybe add a progress meter with --bar or --progress.

Upvotes: 8

user3439894
user3439894

Reputation: 7555

In a terminal, change directory to the one containing the .bz files, then use the following command:

for f in *.bz; do bzcat "$f" | gzip -c >"${f%.*}.gz"; done

This will process each file, one at a time, and give the .gz file the name of the .bz file.

Example: DRR091550_1.fastq.bz2 will become DRR091550_1.fastq.gz.

Upvotes: 2

Related Questions