Reputation: 7051
I have 575 bz2 files with average size 3G and need to convert them to .gz format to make them compatible with a downstream pipeline.
$ ll -h | head
total 1.4T
drwxrws---+ 1 dz33 dcistat 24K Aug 23 09:21 ./
drwxrws---+ 1 dz33 dcistat 446 Aug 22 11:57 ../
-rw-rw---- 1 dz33 dcistat 2.0G Aug 22 11:38 DRR091550_1.fastq.bz2
-rw-rw---- 1 dz33 dcistat 2.0G Aug 22 11:38 DRR091550_2.fastq.bz2
-rw-rw---- 1 dz33 dcistat 2.0G Aug 22 11:38 DRR091551_1.fastq.bz2
-rw-rw---- 1 dz33 dcistat 2.0G Aug 22 11:38 DRR091551_2.fastq.bz2
-rw-rw---- 1 dz33 dcistat 1.9G Aug 22 11:38 DRR091552_1.fastq.bz2
-rw-rw---- 1 dz33 dcistat 1.9G Aug 22 11:38 DRR091552_2.fastq.bz2
-rw-rw---- 1 dz33 dcistat 1.8G Aug 22 11:38 DRR091553_1.fastq.bz2
$ ll | wc -l
575
For a single file I probably can do bzcat a.bz2 | gzip -c >a.gz
, but I am wondering how to convert them entirely with one command or loop in bash/linux.
Upvotes: 5
Views: 6297
Reputation: 207678
Do them simply and fast in parallel with GNU Parallel:
parallel --dry-run 'bzcat {} | gzip -c > {.}.gz' ::: *bz2
Sample Output
bzcat a.bz2 | gzip -c > a.gz
bzcat b.bz2 | gzip -c > b.gz
bzcat c.bz2 | gzip -c > c.gz
If you like how it looks, remove the --dry-run
. Maybe add a progress meter with --bar
or --progress
.
Upvotes: 8
Reputation: 7555
In a terminal, change directory to the one containing the .bz files, then use the following command:
for f in *.bz; do bzcat "$f" | gzip -c >"${f%.*}.gz"; done
This will process each file, one at a time, and give the .gz file the name of the .bz file.
Example: DRR091550_1.fastq.bz2
will become DRR091550_1.fastq.gz
.
Upvotes: 2