Reputation: 1305
I studied lempel ziv compression algorithm in school.
The basic idea is to maintain a table of commonly occurring bit sequences and assign a unique symbol to each.
I was wondering if a file system design where we maintain a common table across the file system would be feasible?
Then the short codes could be reused across multiple files.
Would this save even more space?(since LZW performs better on longer strings) or would we reach a point that our symbol to sequence table is so big that the symbol keys start taking up more or equal space to the byte string they are replacing?
Upvotes: 2
Views: 106
Reputation: 133995
This wouldn't make much sense because the compression table would have to be static. Otherwise, if it changed then a file that was compressed with the old table couldn't be decompressed with the new table.
And if the compression table is static, then you don't get the advantage of per-file compression. For example, one file might have a lot of common English words. Its compression table would contain "shortcuts" for those common sequences. Another file might be common French words, and its compression table would be filled with those common sequences. But if you had a common compression table across the entire system, then it's possible that neither file would have decent compression.
A very big part of the compression that LZW and similar schemes gives you is locally optimal compression. You would have to forego that if you wanted a system-wide table. The result would be much less impressive compression ratios.
Upvotes: 1