Hendrik Wiese
Hendrik Wiese

Reputation: 2219

Find invalid images in a huge collection

I'm in need of a way to detect broken image files in a huge collection (tens of thousands of images). The way I do it now is by using PIL like this:

try:
    im = PIL.Image.open(f)
    # image valid
except:
    # image invalid
    ...

But that's way too slow. It'd take hours, days to check all files.

Is there a quicker way to find all invalid images in a folder by means of Python?

imghdr isn't sufficient unfortunately because it does not detect truncated images.

Upvotes: 0

Views: 349

Answers (1)

Roland Smith
Roland Smith

Reputation: 43523

You could speed it up some by wrapping the code from your question up in a function. Then make a list of all filenames to be tested and use Pool.map from the multiprocessing module to apply the function in parallel to all files using as many cores as your machine has.

If your machine has N cores, this could give you a factor N speedup. In practice it will be less because of multiprocessing overhead and maybe I/O bandwidth limits.

Upvotes: 1

Related Questions