user4979733
user4979733

Reputation: 3411

Is there any advantage of doing .gz.bz2?

I notice many of the files generated in my team have .gz.bz2 extensions. These are pure text files. The goal is to save disk space. I tried an experiment where I did gzip and gzip + bzip2 on the same set of files:

$ du -h pat0/*
1.6M    pat0/p0_c1.diag.csv.gz
1.5M    pat0/p0_c2.diag.csv.gz
2.3M    pat0/p0_c3.diag.csv.gz
1.8M    pat0/p0_c4.diag.csv.gz
3.0M    pat0/p0_c5.diag.csv.gz
3.2M    pat0/p0_c6.diag.csv.gz
3.0M    pat0/p0_c7.diag.csv.gz
3.0M    pat0/p0_c8.diag.csv.gz

$ du -h pat0.bak/*
1.6M    pat0.bak/p0_c1.diag.csv.gz.bz2
1.5M    pat0.bak/p0_c2.diag.csv.gz.bz2
2.3M    pat0.bak/p0_c3.diag.csv.gz.bz2
1.8M    pat0.bak/p0_c4.diag.csv.gz.bz2
3.0M    pat0.bak/p0_c5.diag.csv.gz.bz2
3.2M    pat0.bak/p0_c6.diag.csv.gz.bz2
3.0M    pat0.bak/p0_c7.diag.csv.gz.bz2
2.9M    pat0.bak/p0_c8.diag.csv.gz.bz2

I don't see significant improvement. If nothing significant is expected, then what is the advantage of doing .gz.bz2? Why not just one or the other?

Upvotes: 0

Views: 109

Answers (1)

Mark Adler
Mark Adler

Reputation: 112219

You already did the experiment, and your results are typical. Compressing an already compressed file will provide non-negligible gains only if the original data was so highly redundant that the maximum compression ability of the first compressor was saturated.

If you're going to spend the time to bzip2 those files, you would get far better results by ungzipping them first, and then applying bzip2. Applying xz would be better still.

Upvotes: 1

Related Questions