Reputation: 1
I change images to hash values and try to classify images with similar hash values into the same group.
so for example.
import imagehash
# img1, img2, img3 are same images
img1_hash = imagehash.average_hash(Image.open('data/image1.jpg'))
img2_hash = imagehash.average_hash(Image.open('data/image2.jpg'))
img3_hash = imagehash.average_hash(Image.open('data/image3.jpg'))
img4_hash = imagehash.average_hash(Image.open('data/image4.jpg'))
print(img1_has, img2_hash, img3_hash, img4_hash)
>>> 81c38181bf8781ff, 81838181bf8781ff, 81838181bf8781ff, ff0000ff3f00e7ff
hash_lst = [['img1', img1_hash], ['img2', img2_hash], ['img3', img3_hash], ['img4', img4_hash]]
##
Grouping Code
##
outputs:
[['img1', 'img2', 'img3'], ['img4']]
Is there a grouping code to classify effectively?
Thank you
Upvotes: 0
Views: 180
Reputation: 37
Below code will give you not-similar list of element. You can adjust the 0.5 value to judge the similarity.
from difflib import SequenceMatcher
from itertools import combinations
img1_hash = '81c38181bf8781ff'
img2_hash = '81838181bf8781ff'
img3_hash = '81838181bf8781ff'
img4_hash = 'ff0000ff3f00e7ff'
hash_lst = [img1_hash, img2_hash, img3_hash, img4_hash]
hash_comb_lst = list(combinations(hash_lst, 2))
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
sim_hashes = []
for hashs in hash_comb_lst:
if similar(hashs[0], hashs[1]) > 0.5:
sim_hashes.append(hashs[0])
sim_hashes.append(hashs[1])
diff_group = list(set(hash_lst) - set(sim_hashes))
simm_group = list(set(hash_lst) - set(diff_group))
You can print out the diff_group and will see
print(diff group)
['ff0000ff3f00e7ff']
Use SequenceMatcher lib and edit the code to your taste.
Upvotes: -1