Reputation: 23
Problem:
I've got around 10,000 images to compare to each other. My current program compares around 60 images every second, but at that speed, it would take nearly 9 days of runtime to finish. I've tried using c++ but the final code would take nearly 3x as long as the python one.
Question:
Is there any faster or more efficient way to compare images? I'm fine with using other languages and other libraries.
Code:
from PIL import Image
from PIL import ImageChops
import math, operator
from functools import reduce
import os
def rmsdiff(image_1, image_2):
h = ImageChops.difference(image_1, image_2).histogram()
return math.sqrt(reduce(operator.add, map(lambda h, i: i%256*(h**2), h, range(len(h)))) / (float(image_1.size[0]) * image_1.size[1]))
current = 0
try:
dire = "C:\\Users\\Nikola\\Downloads\\photos"
photos = os.listdir(dire)
for idx, val in enumerate(photos):
if val == "":
start = idx
break
for photo_1 in range(start,len(photos)):
if "." not in photos[photo_1]:
continue
print(f'Image: {photos[photo_1]}')
with Image.open(dire+"\\"+photos[photo_1]) as image_1:
image_1 = image_1.resize((16,16))
for photo_2 in range(photo_1+1, len(photos)):
current = photos[photo_2]
try:
if photos[photo_2][-4] != "." and photos[photo_2][-5] != ".":
continue
except:
continue
with Image.open(dire+"\\"+photos[photo_2]) as image_2:
image_2 = image_2.resize((16,16))
try:
value = rmsdiff(image_1, image_2)
if value < 12:
print(f'Similar Image: {photos[photo_1]}')
continue
except:
pass
except KeyboardInterrupt:
print()
print(current)
Upvotes: 1
Views: 1038
Reputation: 207345
Following on from my comments, I'm suggesting that loading and resizing take the most time, so that is where I'd aim to optimise.
I don't have a Python interpreter available at the moment to test properly, but along these lines:
from functools import lru_cache
@lru_cache(maxsize=None)
def loadImage(filename)
im = Image.open(filename)
im = im.resize((16,16))
return im
That should already make a massive difference. Then adjust to use "draft" mode, something like:
im = Image.open(filename)
im.draft('RGB',(32,32))
im = im.resize((16,16)
return im
You could also multithread the loading if your laptop has a decent CPU.
Upvotes: 1
Reputation: 27896
Your problem is quite strange, though. The fact that you have to read the data itself in order to compare is not something that should happen most of the time, and it would make the most sense that you have some metadata to compare by.
That said, here are some very different approaches to speeding this up.
Upvotes: 0