Nick S
Nick S

Reputation: 373

Optimizing memory usage with large images in Python (Pillow)

I'm using pillow and working with pretty large images (at least 10500 x 10500 px), which in turn uses up quite a lot of memory. I was wondering if there was a way to lower it and tried using a compressed image to load (which would be ~400kb instead of 420mb), instead of directly creating a new one.. but the memory usage is the same:

Line #    Mem usage    Increment   Line Contents
================================================
   151   35.969 MiB    0.742 MiB       base = Image.open("C:/Users/Nick/Desktop/transparent.png")
   152  456.992 MiB  421.023 MiB       base.load()
   155  877.641 MiB  420.648 MiB       base_hallway = Image.new("RGBA", (map_width_px, map_height_px))

I also tried using a jpg or Image.new() with RGB only for the second image, but ditching the alpha channel didn't work either.

Line #    Mem usage    Increment   Line Contents
================================================
   151   36.309 MiB    0.766 MiB       base = Image.open("C:/Users/Nick/Desktop/transparent.png")
   152  457.359 MiB  421.051 MiB       base.load()
   156  457.367 MiB    0.008 MiB       base_hallway = Image.open("C:/Users/Nick/Desktop/blackjpg.jpg")
   157  878.312 MiB  420.945 MiB       base_hallway.load()

Mainly the operation being run on the base images is pasting other images on top of them in different positions. The rooms or hallways also have operations on them, but use almost no memory in comparison, such as picking the proper position to paste depending on the previous room or hallway, rotating if necessary, etc. But since it requires dozens or even hundreds of items pasted on top, I can't close the base images after every iteration (so only base OR base_hallway is open at any one time). I tried to open the base and base_hallway images only when needed, which requires a lot of save and close operations as well. That ended up increasing the time it takes for the code to run tenfold.. Simplified:

room = Image.open(open_room)

if next_tile == "room":
    base.paste(room, box=(rand_width_position, rand_height_position), mask=room)
elif next_tile == "hallway" or next_tile == "junction":
    base_hallway.paste(room, box=(rand_width_position, rand_height_position), mask=room)

Is there any way to optimize the memory usage?

Thanks!

Upvotes: 4

Views: 4288

Answers (1)

jcupitt
jcupitt

Reputation: 11190

I had a go with pyvips. I don't know if that's a possibility for you.

pyvips is a streaming image processing library, so rather than keeping everything in memory, it builds a network of operations and then streams pixels from your source images through the network and straight back to disc.

This program will load an image, paste a lot more images on top at random positions, then write the result back.

import sys
import random
import pyvips

# the access hint means we want to stream this image
base = pyvips.Image.new_from_file(sys.argv[2], access='sequential')

for filename in sys.argv[3:]:
    tile = pyvips.Image.new_from_file(filename, access='sequential')
    x = random.randint(0, base.width - tile.width)
    y = random.randint(0, base.height - tile.height)
    base = base.insert(tile, x, y)

# all the processing happens on the final save as the pipeline executes
base.write_to_file(sys.argv[1])

For test data, I made 100 1,500 x 2,000 pixel images plus a 10,000 x 10,000 pixel background image. I can run it like this:

$ /usr/bin/time -f %e:%M python3 ../insert.py x.jpg ../background.jpg *.jpg
775200:0.75

So that's 0.75s and 780mb of memory for the whole process.

This is a big desktop machine with 32 threads. If I tell vips to run with fewer threads, memory use drops quite a bit:

$ VIPS_CONCURRENCY=1 /usr/bin/time -f %e:%M python3 ../insert.py x.jpg ~/pics/huge.jpg *.jpg
199020:1.38

Under 200mb now, though it's slower.

Upvotes: 5

Related Questions