Reputation: 3411
I notice many of the files generated in my team have .gz.bz2 extensions. These are pure text files. The goal is to save disk space. I tried an experiment where I did gzip and gzip + bzip2 on the same set of files:
$ du -h pat0/*
1.6M pat0/p0_c1.diag.csv.gz
1.5M pat0/p0_c2.diag.csv.gz
2.3M pat0/p0_c3.diag.csv.gz
1.8M pat0/p0_c4.diag.csv.gz
3.0M pat0/p0_c5.diag.csv.gz
3.2M pat0/p0_c6.diag.csv.gz
3.0M pat0/p0_c7.diag.csv.gz
3.0M pat0/p0_c8.diag.csv.gz
$ du -h pat0.bak/*
1.6M pat0.bak/p0_c1.diag.csv.gz.bz2
1.5M pat0.bak/p0_c2.diag.csv.gz.bz2
2.3M pat0.bak/p0_c3.diag.csv.gz.bz2
1.8M pat0.bak/p0_c4.diag.csv.gz.bz2
3.0M pat0.bak/p0_c5.diag.csv.gz.bz2
3.2M pat0.bak/p0_c6.diag.csv.gz.bz2
3.0M pat0.bak/p0_c7.diag.csv.gz.bz2
2.9M pat0.bak/p0_c8.diag.csv.gz.bz2
I don't see significant improvement. If nothing significant is expected, then what is the advantage of doing .gz.bz2? Why not just one or the other?
Upvotes: 0
Views: 109
Reputation: 112219
You already did the experiment, and your results are typical. Compressing an already compressed file will provide non-negligible gains only if the original data was so highly redundant that the maximum compression ability of the first compressor was saturated.
If you're going to spend the time to bzip2 those files, you would get far better results by ungzipping them first, and then applying bzip2. Applying xz would be better still.
Upvotes: 1