Reputation: 364
I am currently moving a script which creates local sensitive hashes from images from a windows host to a debian one.
My problem is Pillow returns different image data from the same source image on both platforms.
I only observed this behavior for jpegs.
Test case:
from PIL import Image
import md5
import urllib2
from cStringIO import StringIO
urls = ("https://i.imgur.com/Mx6NQwM.jpg","https://i.imgur.com/MN1TKu5.png")
print("VERSION %s" % Image.VERSION)
for url in urls:
response = urllib2.urlopen(url).read()
img = Image.open(StringIO(response)).convert("RGB")
img_md5 = "".join("".join(map(chr, x)) for x in img.getdata())
print("URL: %s" % url)
print("Plain md5:\t%s" % md5.new(response.read()).hexdigest())
print("Image md5:\t%s" % md5.new(img_md5).hexdigest())
Should return the same md5 hashes on both systems.
My results:
Windows 7:
VERSION 1.1.7
URL: https://i.imgur.com/Mx6NQwM.jpg
Plain md5: 4aacd5b92575ffca6d0ab884f95cc1f9
Image md5: 10eaf568f4d9d33c722ea702fc4d1025
URL: https://i.imgur.com/MN1TKu5.png
Plain md5: d05e6dc1311339b806e5998f15fc818c
Image md5: 38fc986c5cd9605038ee627b11687344
Debian jessie:
VERSION 1.1.7
URL: https://i.imgur.com/Mx6NQwM.jpg
Plain md5: 4aacd5b92575ffca6d0ab884f95cc1f9
Image md5: 7347c6286f4d917649d967a5025e392e
URL: https://i.imgur.com/MN1TKu5.png
Plain md5: d05e6dc1311339b806e5998f15fc818c
Image md5: 38fc986c5cd9605038ee627b11687344
The LSHs are somewhat similar, but different enough for it to be problematic.
The pillow version on both systems is 2.9.0.
Is there some way to get the same pixel value on the debian system like i get on the windows one ?
And in general: Does someone know why this is happening ?
Upvotes: 1
Views: 965
Reputation: 364
I "solved" my problem.
I had PIL
and Pillow
installed on the windows host by accident.
It looks like it choose to use the PIL
version.
After using PIL
on the debian machine, too the results are the same on both machines.
Generally it would be wiser to upgrade the PIL
side to use Pillow
, but in my case i need to generate the exact same hashes like i did with the PIL version.
Morale of the story: PIL
and Pillow
may return different image data when loading the same images.
Upvotes: 1
Reputation: 28360
I personally would not expect the internal image representation to necessarily be identical between different machines &/or Operating Systems - especially if one of them is 64 bit and the other is 32 bit. That is not guaranteed and it is what you are calculating the image MD5 on - You are getting the same file MD5 on both systems so the file is identical if you need the image data MD5 then you should convert to a bitmap of known characteristics first - then MD5 the bitmap rather than the "image".
Upvotes: 1