Reputation: 42013
Given two images:
image1.jpg
image2.jpg
What's a fast way to detect if they are visually identical in Python? For example, they may have different EXIF data which would yield different checksums, even though the image data is the same).
Imagemagick has an excellent tool, "identify," that produces a visual hash of an image, but it's very processor intensive.
Upvotes: 14
Views: 17532
Reputation: 714
Using PIL/Pillow:
from PIL import Image
im1 = Image.open('image1.jpg')
im2 = Image.open('image2.jpg')
if list(im1.getdata()) == list(im2.getdata()):
print("Identical")
else:
print("Different")
Upvotes: 27
Reputation: 11179
Just because no one has mentioned it yet, Spatial CIELAB is another useful image similarity metric.
It's simpler than it sounds: you blur the two images by an amount related to the acuity of your observer, then find the CIELAB difference (delta E). You can take the peak or average of the difference image, depending on your application.
Using pyvips, you could write:
#!/usr/bin/python3
import sys
import pyvips
# the access hint means these images can be streamed in parallel rather
# than fully decoded
image1 = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
image2 = pyvips.Image.new_from_file(sys.argv[2], access="sequential")
# blur by an amount related to the visual acuity of the observer -- this will
# help remove peaks caused by small alignment differences, then take the
# CIELAB76 colour difference
sigma = 3.0
# diff = image1.gaussblur(sigma).dE76(image2.gaussblur(sigma))
diff = image1.resize(1.0 / sigma).dE76(image2.resize(1.0 / sigma))
# compute the peak difference ... over perhaps 20 means a visible difference
print(f"peak difference of {diff.max()} visual units")
As a small optimization, resizing rather than blurring reduces the number of pixels you need to compute the colour difference for.
This PC will compute a difference for a pair of 6k x 4k JPGs in about 400ms.
$ vipsheader ~/pics/theo.jpg
/home/john/pics/theo.jpg: 6048x4032 uchar, 3 bands, srgb, jpegload
$ time ./try51.py ~/pics/theo2.jpg ~/pics/theo.jpg
peak difference of 0.0 visual units
real 0m0.396s
user 0m0.952s
sys 0m0.197s
Upvotes: 2
Reputation: 53089
One way to do that in Python/OpenCV is to get the absdiff, then get the mean (average) of the absdiff over the whole absdiff image.
Input1 (PNG):
Input2 (JPG):
import cv2
import numpy as np
# read image 1
img1 = cv2.imread('lena.png')
# read image 2
img2 = cv2.imread('lena.jpg')
# do absdiff
diff = cv2.absdiff(img1,img2)
# get mean of absdiff
mean_diff = np.mean(diff)
# print result
print(mean_diff)
1.8992767333984375
Upvotes: 2
Reputation: 51
Using https://github.com/andrewekhalel/sewar to compare image similar
> from sewar.full_ref import uqi
> uqi(img1,img2)
0.9586952304831419
Upvotes: 2
Reputation: 90193
I'm still submitting my way to tackle this -- even if the OP says that ImageMagick's way is too processor intensive (and even though my way does not involve Python)... Maybe my answer is useful to other people then, arriving at this page via search engine.
Be aware that any image comparison which is supposed to discover fine differences in hi-res images is more processor intensive than a discovery of big differences in low-res images, as it has to compare a lot more pixels.
Here is an ImageMagick command that compares two (same-sized!) images, and returns all differing pixels as red, identical pixels as white. The first one has the reference image as a faded out background image for the composition of the red-white pixel matrix. .img
may be any of the IM-supported formats (.png, .PnG, .pNG, .PNG, .jpg, .jpeg, .jPeG, .tif, .tiff, .ppm, .gif, .pdf, ...):
compare reference.img similar.img delta.img
compare reference.img similar.img -compose src delta.img
By default, the comparison is made at 72 PPI. If you need more resolution (like, with a vector based image, such as a PDF page), you can add -density
to increase it. Of course, the processing time will increase accordingly:
compare -density 300 reference.img similar.img delta.img
If you add a fuzz factor, you can tell ImageMagick to treat all pixels as identical which are no more than a certain color distance apart:
compare -fuzz '3%' reference.img similar.img -compose src delta.img
More recent versions of ImageMagick support the phash
algorithm:
compare -metric phash reference.img similar.img -compose src delta.img
This will, besides creating the delta.img
for visualization, return a numeric value that indicates the "difference" between two images. The closer it is to 0
, the more similar are the two images compared.
Create a few small PDF pages with minor differences in them. I'm using Ghostscript:
gs -o ref1.pdf -sDEVICE=pdfwrite -g1050x1350 \
-c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (0) show showpage"
gs -o ref2.pdf -sDEVICE=pdfwrite -g1050x1350
-c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (0) show showpage"
gs -o ref3.pdf -sDEVICE=pdfwrite -g1050x1350 \
-c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (O) show showpage"
gs -o ref4.pdf -sDEVICE=pdfwrite -g1050x1350 \
-c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (O) show showpage"
Now compare ref1.pdf
with ref3.pdf
at the default resolution of 72 PPI:
compare -metric phash ref1.pdf ref3.pdf delta-ref1-ref3.pdf
7.61662
The returned pHash value is 7.61662
. This indicates that ImageMagick's compare
discovered at least some differences.
Let's look at the visualization. I'll create a side-by-side visualization of the three PDFs/images (to be shown below):
convert \
-mattecolor blue \
\( ref1.pdf -frame 2x2 \) \
null: \
\( ref3.pdf -frame 2x2 \) \
null: \
\( delta-ref1-ref3.pdf -frame 2x2 \) \
+append \
ref1-ref3-delta.png
As you can see, the different shapes of the 0
(digit 'zero') and the O
(letter o
, capital version) are standing out quite well.
Now the next one: where ref1.pdf
is compared to ref2.pdf
, also at 72 PPI.
compare -metric phash ref1.pdf ref2.pdf delta-ref1-ref2.pdf
0
The returned pHash value now is 0
. This indicates that ImageMagick discovered no difference!
Create a side-by-side visualization of the three PDFs/images:
convert \
-mattecolor blue \
\( ref1.pdf -frame 2x2 \) \
null: \
\( ref2.pdf -frame 2x2 \) \
null: \
\( delta-ref1-ref2.pdf -frame 2x2 \) \
+append \
ref1-ref2-delta.png
As you can see, at 72 PPI ImageMagick does not discover a difference between the two PDFs (as would be indicated by red pixels). According to the Ghostscript command, both show the digit 0
, but at positions which are shifted by 0.1 pt apart in x- and y-directions. So in reality, in the original PDF, there IS a difference. But when rendered at 72 PPI, this difference isn't visible.
Let's try to see the difference with density 600
then:
compare \
-metric phash \
-density 600 \
ref1.pdf \
ref2.pdf \
ref1-ref2-at-density600-delta.png
0.00172769
The returned pHash value at 600 PPI now is 0.00172769
. This is close to zero, but still a difference. The difference is less than the one between ref1.pdf
and ref3.pdf
.
The difference is clearly highlighted now in the visual comparison, even though only by a thin line of red pixels:
Upvotes: 15