Reputation: 378
I want to compress several hundred images together into a single file. The images are all scans of Magic: The Gathering cards, which means they have large blocks of similar color and share many similarities across images like the frame and text box.
I want to take advantage of the similarities between pictures, so formats like JPG and PNG that only consider a single image at a time are useless. Algorithms like DEFLATE also are bad here, because if I understand correctly they only consider a small "context window" that's tiny compared to a set of images a few hundred MB in size.
A simple diffing approach like that mentioned here would probably also not work very well, since the similarities are not pixel-perfect; there are relatively few pixels that are exactly the same color between images, they're just similar.
The video compression suggestion in the same thread would require me to put the images in a specific order, which might not be the optimal one; a better algorithm would itself determine which images are most similar to each other.
The best lead I have so far is something called "set redundancy compression", but I can't find very much information about it; that paper is almost 20 years old, and given how common it is to need to store large sets of similar images, I'm sure much more work has been done on this in the internet age.
Set redundancy compression also appears to be lossless, which I don't want; I need a really high compression ratio, and am ok losing details that aren't visible to the naked eye.
Upvotes: 2
Views: 71
Reputation: 207778
I tried to leverage a couple of ideas to get better compression:
It didn't work very well, but I thought I'd document it anyway...
I took the first 8 images from the page you linked and renamed them "0001.jpg"
through "0008.jpg"
and processed them in zsh
with ImageMagick as follows:
colors=63
# Create common palette for all images
magick ????.jpg +append -colors $colors -unique-colors palette.png
That makes a palette like this which I have stretched vertically so you can see it:
I then remapped all the images to the common palette and restored as PNGs. The original image is on the left and the reduced palette copy is on the right:
# Remap all images to common palette and store as PNG
for f in ????.jpg ; do
base="${f%.*}"
magick "$f" -remap palette.png "${base}.png"
done
I then compared each of the original images with the first image to get the optimal order in which to take lines from each image and append into a tall image:
for f in *.jpg; do magick compare -metric RMSE 0001.jpg $f null:; echo ""; done
0 (0)
10701.5 (0.163295)
11110.5 (0.169535)
11929.1 (0.182026)
28885.4 (0.440763)
28122.7 (0.429124)
27720.9 (0.422994)
20068 (0.306219)
So, I used line 0 from image 0, then image 1, then image 2 then image 3 then image 4 and so on:
# Split all images into constituent lines
for f in {0001..0008} ; do
magick "${f}.png" -crop x1 "${f}-%04d.png"
done
Then I selected lines from the images in my (hopefully optimal) sequence and appended them together in the hope that PNG line filtering would optimise the file size:
for line in {0000..0679} ; do
for image in 0001 0002 0003 0005 0007 0006 0008 ; do
this="${image}-${line}"
magick result.png "${this}.png" -append result.png
done
done
It didn't. The file size was 900kB while the original JPEGs totalled up to a similar size :-(
Upvotes: 1