bisuke
bisuke

Reputation: 331

Python: processing a folder with a million images without getting an error

def preprocess_image(path_to_images):
    print("preprocessing images..")
    files = os.listdir(path_to_images)
    for f in files:
        im = Image.open(os.path.join(path_to_images, f))
        im = im.convert("RGBA")

        datas = im.getdata()
        newData = []
        for item in datas:
            if item[0] == 255 and item[1] == 255 and item[2] == 255:
                newData.append((255, 255, 255, 0))
            else:
                newData.append(item)
        im.putdata(newData)
        im.save(os.path.join(path_to_images, f),"PNG")

Hi. So this code above is supposed to remove the background of every image in a folder. The work works perfectly fine when I process a folder of 10 jpg images, but I get the following error if I run it with a folder that contains 12000 images:

Traceback (most recent call last):
  File "test1.py", line 28, in <module>
    preprocess_image("train_images" )
  File "test1.py", line 9, in preprocess_image
    im = Image.open(os.path.join(path_to_images, f))
  File "/Library/Python/2.7/site-packages/PIL/Image.py", line 2452, in open
    % (filename if filename else fp))
IOError: cannot identify image file 'train_images/.DS_Store'

The folder with the images is called 'train_images', and I don't know where the .DS_Store came from.

I really appreciate your help.

Upvotes: 2

Views: 557

Answers (2)

Tom Sitter
Tom Sitter

Reputation: 1102

.DS_Store is a metadata file from OSX. You can put your image processing code in a try/except block to catch an issues with non-image files in the folder:

for f in files:
    try:
        im = Image.open(os.path.join(path_to_images, f))
        im = im.convert("RGBA")
        # ... rest of code
    except IOError as e:
        print('Error processing file ', f)

Upvotes: 0

Hamuel
Hamuel

Reputation: 633

.DS_Store is a file created by the mac filesystem (when you open a directory with finder) when you walk the directory you should ignore it

try:
 im = Image.open(os.path.join(path_to_images, f))
except:
 print 'fail to read', path_to_images
  • you can add a try and except block as in the above example
  • or if f != '.DS_Store'
  • split to check if the file extension is an image name, ext = 'image.jpg'.split(".") if ext in ['jpg', 'png' ... ]

Upvotes: 5

Related Questions