Reputation: 2219
I'm in need of a way to detect broken image files in a huge collection (tens of thousands of images). The way I do it now is by using PIL like this:
try:
im = PIL.Image.open(f)
# image valid
except:
# image invalid
...
But that's way too slow. It'd take hours, days to check all files.
Is there a quicker way to find all invalid images in a folder by means of Python?
imghdr
isn't sufficient unfortunately because it does not detect truncated images.
Upvotes: 0
Views: 349
Reputation: 43523
You could speed it up some by wrapping the code from your question up in a function. Then make a list of all filenames to be tested and use Pool.map
from the multiprocessing
module to apply the function in parallel to all files using as many cores as your machine has.
If your machine has N cores, this could give you a factor N speedup. In practice it will be less because of multiprocessing
overhead and maybe I/O bandwidth limits.
Upvotes: 1