Reputation:
I (will) have a list of coordinates; using python's pillow module, I want to save a series of (cropped) smaller images to disk. Currently, I am using a for loop to act to determine one coordinate at a time then crop/save the image before proceeding to the next coordinate.
Is there a way to divide this job up such that multiple images can be cropped/saved simultaneously? I understand that this would take up more RAM but would be decrease performance time.
I'm sure this is possible but I'm not sure if this is simple. I've heard terms like 'vectorization' and 'multi-threading' that sound vaguely appropriate to this situation. But these topics extend beyond my experience.
I've attached the code for reference. However, I'm simply trying to solicit recommended strategies. (i.e. what techniques should I learn about to better tailor my approach, take multiple crops at once, etc?)
def parse_image(source, square_size, count, captures, offset=0, offset_type=0, print_coords=False):
"""
Starts at top left corner of image. Iterates through image by square_size (width = height)
across x values and after exhausting the row, begins next row lower by function of
square_size. Offset parameter is available such that, with multiple function calls,
overlapping images could be generated.
"""
src = Image.open(source)
dimensions = src.size
max_down = int(src.height/square_size) * square_size + square_size
max_right = int(src.width/square_size) * square_size + square_size
if offset_type == 1:
tl_x = 0 + offset
tl_y = 0
br_x = square_size + offset
br_y = square_size
for y in range(square_size,max_down,square_size):
for x in range(square_size + offset,max_right - offset,square_size):
if (tl_x,tl_y) not in captures:
sample = src.crop((tl_x,tl_y,br_x,br_y))
sample.save(f"{source[:-4]}_sample_{count}_x{tl_x}_y{tl_y}.jpg")
captures.append((tl_x,tl_y))
if print_coords == True:
print(f"image {count}: top-left (x,y): {(tl_x,tl_y)}, bottom-right (x,y): {(br_x,br_y)}")
tl_x = x
br_x = x + square_size
count +=1
else:
continue
tl_x = 0 + offset
br_x = square_size + offset
tl_y = y
br_y = y + square_size
else:
tl_x = 0
tl_y = 0 + offset
br_x = square_size
br_y = square_size + offset
for y in range(square_size + offset,max_down - offset,square_size):
for x in range(square_size,max_right,square_size):
if (tl_x,tl_y) not in captures:
sample = src.crop((tl_x,tl_y,br_x,br_y))
sample.save(f"{source[:-4]}_sample_{count}_x{tl_x}_y{tl_y}.jpg")
captures.append((tl_x,tl_y))
if print_coords == True:
print(f"image {count}: top-left (x,y): {(tl_x,tl_y)}, bottom-right (x,y): {(br_x,br_y)}")
tl_x = x
br_x = x + square_size
count +=1
else:
continue
tl_x = 0
br_x = square_size
tl_y = y + offset
br_y = y + square_size + offset
return count
Upvotes: 0
Views: 64
Reputation: 321
What you want to achieve here is to have a higher degree of parallelism, the first thing to do is to understand what is the minimum task that you need to do here, and from that, think in a way to better distribute it.
First thing to notice here is that there is two behaviour, first if you have offset_type 0, and another if you have offset_type 1, split that off into two different functions.
Second thing is: given an image, you're taking crops of a given size, at a given offset(x,y) for the whole image. You could for instance, simplify this function to take one crop of the image, given the image offset(x,y). Then, you could call this function for all the x and y of the image in parallel. That's pretty much what most image processing frameworks tries to achieve, even more the one's that run code inside the GPU, small blocks of code, that operates locally in the image.
So lets say your image has width=100, height=100, and you're trying to make crops of w=10,h=10. Given the simplistic function that I described, I will call it crop(img, x, y, crop_size_x, crop_size_y)
All you have to do is create the image:
img = Image.open(source)
crop_size_x = 10
crop_size_y = 10
crops = [crop(img, x, y, crop_size_x, crop_size_y) for x, y in zip(range(img.width), range(img.height))]
later on, you can then replace the list comprehension for a multi_processing library that can actually spawn many processes, do real parallelism, or even write such code inside a GPU kernel/shader, and use the GPU parallelism to achieve high performance.
Upvotes: 1