Kevin
Kevin

Reputation: 3239

Python - load lots of images without using all available ram

I have about 1.5 GB of images that I need to process. The problem is that when I try loading them as np arrays I seem to use up all of my ram (8 GB).

Here is my method for loading images:

def load_image( infilename ) :
    img = Image.open( infilename )
    img.load()
    data = np.asarray( img, dtype="int32" )
    img.close()
    del img
    return data

I thought closing and deleting the img would help, but it doesn't. Can this have something to do with garbage collection?

Code to loop through all images in a list of file names:

for i in range(len(files)):
    imgArray = imgs.load_image(files[i])
    images.append(imgArray)
    shapes.append(np.shape(imgArray))

Is there a better way?

Upvotes: 1

Views: 1687

Answers (2)

yelsayed
yelsayed

Reputation: 5532

It might be worth it to load the image files one by one using PIL to get their size tuples, collect your statistics about averages and what not, then open them again in numpy or PIL to do the actual processing. You might also want to consider sampling for the statistics part so you don't need to load all of them, not that it should take that long anyway, PIL is relatively efficient.

Upvotes: 2

mjp
mjp

Reputation: 1679

You may be able to use manual garbage collection to clear some of the memory between loops:

def memclear():
    import gc   #garbage collector
    cleared = gc.collect()

    print(cleared)

call: memclear() at the end of each loop, so:

for i in range(len(files)):
    imgArray = imgs.load_image(files[i])
    images.append(imgArray)
    shapes.append(np.shape(imgArray))
    memclear()

Hopefully this fixes it. I'm assuming this was downvoted because it manually calls garbage cleaning, which is generally frowned upon, but unfortunately it seems to be necessary sometimes.

Upvotes: 2

Related Questions