Reputation: 8410
I have a use case where I want to cross compare 2 sets of images to know the best similar pairs.
However, the sets are quite big, and for performance purposes I don't want to open and close images all the time.
So my idea is:
std::map<int, Magic::Image> set1;
for(...) { set1[...] = Magic::Image(...);}
std::map<int, int> best;
for(...) {
set2 = Magic::Image(...);
//Compare with all the set1
...
best[...] = set1[...]->first;
}
Obviusly I don't need to store all the set 2, since I work image by image. But in any case the set1 is already so big that storing 32bit images is too much. For reference: 15000 images, 300x300 = 5GB
I though about reducing the memory by downsampling the images to monochrome (it does not affect my use case). But how to do it? Even if I get a color channel, Image-Magick still threats the new image as 32bits, even if it is just a channel.
My final approach has been to write a self-parser that reads color by color, converts it, and creates a bit-vector. Then do XORs and count bits. That works. (using only 170 MB)
However, is not flexible. What if I want to use 2bits, or 8 bits at some point? Is it possible in any way using Imagemagick own classes and just call compare()
?
Thanks!
Upvotes: 0
Views: 318
Reputation: 207345
I have a couple of suggestions - maybe something will give you an idea!
Suggestion 1
Maybe you could use a Perceptual Hash. Rather than holding all your images in memory, you calculate a hash one at a time for each image and then compare the distance between the hashes.
Some pHASHes are invariant to image scale (or you can scale all images to the same size before hashing) and most are invariant to image format.
Here is an article by Dr Neal Krawetz... Perceptual Hashing.
ImageMagick can also do Perceptual Hashing and is callable from PHP - see here.
I also wrote some code some time back for this sort of thing... code.
Suggestion 2
I understand that ImageMagick Version 7 is imminent - no idea who could tell you more - and that it supports true single-channel, grayscale images - as well as up to 32 channel multi-spectral images. I believe it can also act as a server - holding images in memory for subsequent use. Maybe that can help.
Suggestion 3
Maybe you can get some mileage out of GNU Parallel - it can keep all your CPU cores busy in parallel and also distribute work across a number of servers using ssh
. There are plenty of tutorials and examples out there, but just to demonstrate comparing each item of a named set of images (a,b,c,d) with each of a numbered set of images (1,2), you could do this:
parallel -k echo {#} compare {1} {2} ::: a b c d ::: 1 2
Output
1 compare a 1
2 compare a 2
3 compare b 1
4 compare b 2
5 compare c 1
6 compare c 2
7 compare d 1
8 compare d 2
Obviously I have put echo
in there so you can see the commands generated, but you can remove that and actually run compare
.
So, your code might look more like this:
#!/bin/bash
# Create a bash function that GNU Parallel can call to compare two images
comparethem() {
result=$(convert -metric rmse "$1" "$2" -compare -format "%[distortion]" info:)
echo Job:$3 $1 vs $2 $result
}
export -f comparethem
# Next line effectively uses all cores in parallel to compare pairs of images
parallel comparethem {1} {2} {#} ::: set1/*.png ::: set2/*.png
Output
Job:3 set1/s1i1.png vs set2/s2i3.png 0.410088
Job:4 set1/s1i1.png vs set2/s2i4.png 0.408234
Job:6 set1/s1i2.png vs set2/s2i2.png 0.406902
Job:7 set1/s1i2.png vs set2/s2i3.png 0.408173
Job:8 set1/s1i2.png vs set2/s2i4.png 0.407242
Job:5 set1/s1i2.png vs set2/s2i1.png 0.408123
Job:2 set1/s1i1.png vs set2/s2i2.png 0.408835
Job:1 set1/s1i1.png vs set2/s2i1.png 0.408979
Job:9 set1/s1i3.png vs set2/s2i1.png 0.409011
Job:10 set1/s1i3.png vs set2/s2i2.png 0.407391
Job:11 set1/s1i3.png vs set2/s2i3.png 0.408614
Job:12 set1/s1i3.png vs set2/s2i4.png 0.408228
Suggestion 3
I wrote an answer a while back about using REDIS to cache images - that can also work in a distributed fashion amongst a small pool of servers. That answer is here.
Suggestion 4
You may find that you can get better performance by converting the second set of images to Magick Pixel Cache format so that they can be DMA'ed into memory rather than needing to be decoded and decompressed each time. So you would do this:
convert image.png image.mpc
which gives you these two files which ImageMagick can read really quickly.
-rw-r--r-- 1 mark staff 856 16 Jan 12:13 image.mpc
-rw------- 1 mark staff 80000 16 Jan 12:13 image.cache
Note that I am not suggesting you permanently store your images in MPC format as it is unique to ImageMagick and can change between releases. I am suggesting you generate a copy in that format just before you do your analysis runs each time.
Upvotes: 1