Reputation: 41
I am working on a project I wanted to do for quite a while. I wanted to make an all-round huffman compressor, which will work, not just in theory, on various types of files, and I am writing it in python:
text - which is, for obvious reasons, the easiet one to implement, already done, works wonderfully.
images - this is where I am struggling. I don't know how to approach images and how to read them in a simple way that it'd actually help me compress them easily. I've tried reading them pixel by pixel, but somehow, it actually enlarges the picture instead of compressing it.
What I've tried: Reading the image pixel by pixel using Image(PIL), get all the pixels in a list, create a freq table (for each pixel) and then encrypt it. Problem is, imo, that I am reading each pixel and trying to make a freq table out of that. That way, I get way too many symbols, which leads to too many lengthy huffman codes (over 8 bits).
I think I may be able to solve this problem by reading a larger set of pixels or anything of that sort because then I'd have a smaller code table and therefore less lengthy huffman codes. If I leave it like that, I can, in theory, get 255^3 sized code table (since each pixel is (0-255, 0-255, 0-255)).
Is there any way to read larger amount of pixels at a time (>1 pixel) or is there a better way to approach images when all needed is to compress?
Thank you all for reading so far, and a special thank you for anyone who tries to lend a hand.
edited: If huffman is a real bad compression algorithm for images, are there any better ones you can think off? The project I'm working on can take different algorithms for different file types if it is neccessary.
Upvotes: 0
Views: 1148
Reputation: 64933
Encoding whole pixels like this often results in far too many unique symbols, that each are used very few times. Especially if the image is a photograph or if it contains many coloured gradients. A simple way to fix this is splitting the image into its R, G and B colour planes and encoding those either separately or concatenated, either way the actual elements that are being encoded are in the range 0..255 and not multi-dimensional.
But as you suspect, exploiting just 0th order entropy is not so great for many images, especially photographs. As example of what some existing formats do, PNG uses filters to take some advantage of spatial correlation (great for smooth gradients), JPG uses quantized discrete cosine transforms and (usually) a colour space transformation to YCbCr (to decorrelate the channels, and to crush Chroma more mercilessly than Luma) and (usually) Chroma subsampling, JPEG2000 uses wavelets and colour space transformation both in its lossy and lossless forms (though different wavelets, and a different colour space transformation) and also supports subsampling though dropping a wavelet scale achieves a similar effect.
Upvotes: 2