Reputation: 8092
I have a directory with a single image of a baseball in it, image is 1.jpg. I use cv2 to read in the image . I then define a path to write the image back into the same directory as 2.jpg. So 1.jpg and 2.jpg are identical. Then for each image I calculate a "difference" hash of length 256 using the function get_hash. I then print out the hash for each image. They are almost identical but differ by at least 1 bit. Can not figure out why. Thought it could it be due to JPG compression when the image was copied so I also ran the code using png format for both images and still got different hash values. Any insight would be appreciated. Code is shown below
def get_hash(fpath, hash_length):
dim = int(math.sqrt(hash_length)) # with hash_length=256 dim=16
r_str=''
img=cv2.imread(fpath,0) # read image as gray scale image
img=cv2.resize(img, (dim,dim), interpolation = cv2.INTER_NEAREST)
img=img.flatten() # now a 256 bit vector
list2=list(img)
for col in range (0,len(list2)-1):
if(list2[col]>list2[col+1]):
value='1'
else:
value='0'
r_str=r_str + value
return r_str
def match(value1, value2, distance):
# returns True is the number of mismatches in the hashes is less than distance
# with distance=0 returns True only if hashes are identical
mismatch_count=0
for i in range(0,len(value1) ):
if value1[i] !=value2[i]:
mismatch_count +=1
if mismatch_count>distance:
return False
else:
return True
path_to_image=r'C:\Temp\balls\dup3\1.jpg'
img=cv2.imread(path_to_image)
path_to_write_image=r'C:\Temp\balls\dup3\2.jpg'
cv2.imwrite(path_to_write_image, img) # write the identical image to directory with file name 2.jpg
hash_length = 256
h1=get_hash(path_to_image, hash_length)
h2=get_hash(path_to_write_image, hash_length)
print (h1)
print (h2)
distance = 0 # both hashes must match identically
m = match(h1, h2, distance)
print (m) # should be true since the images are identical but returns false
# because there is a single bit difference in the two hashes
256 length hash to long to put here but here is the region in which the two hash values differ by 1 bit (6th bit from the end)
hash for 1.jpg
00000000000000000000011000000000000010001001000000110000000010000010001
hash for 2.jpg
00000000100000000000011000000000000010001001000000110000000010000110001
Upvotes: 3
Views: 1439
Reputation: 563
[JPG]
The saved image 2.jpg
is different from the original image 1.jpg
.
You can compare the images online.
[BMP]
I've trid to re-save image as bmp, so they are equal at all, then Their hash values are also equal.
[PNG]
When convert to png, the images are equal, but I found the bit depth are different.
Upvotes: 2