Reputation: 101
As best as I understand, basic principle behind data compression is searching for repeated patterns and getting rid of the found duplicates, so the end result cannot be compressed any further without data loss, and if attempted anyway, will result in increase of the data size instead of desired reduction. But then there's, for example, ssh compression, which (when ssh is used as proxy) supposedly speeds up even already gzip-compressed and https-encrypted internet traffic. How and why it works (if it does)? Can a compressed file be compressed again without data loss via some magic? What are the use cases where it actually can happen and where it'd be useful?
Upvotes: 0
Views: 153
Reputation: 112394
Generally only when the first compression reaches or at least approaches that compression format's maximum compression ratio. This would require highly redundant data as the uncompressed input. As you approach the maximum compression ratio, some redundancy remains in the compressed data.
A simple example is deflate, whose maximum compression ratio is 1032:1. If I start with a billion (109) zero bytes, the first compression with gzip takes that down to 970501 bytes, a ratio of 1030.4:1. That result itself is mostly zeros, so a second compression gets it down to 2476 bytes, a ratio of 394.8:1. (I am subtracting the gzip headers and trailers to compute the ratio.) That is still redundant, though not with very long strings of zeros. It compresses a third time down to 298 bytes for a ratio of 8.78:1.
An attempt to compress a fourth time results in a larger output, as you would normally get when attempting to compress already compressed data. That's what happens most of the time, since normal compressed data is indistinguishable from random data to a compressor.
A second compression by ssh/sshd on already compressed data would almost never speed things up. It would only slow them down. Not just from the small expansion of the data, but from the time it takes to compress.
Upvotes: 1