Vectorizing the reshaping and cropping of images using PIL

Question

I have a number of images that I want to crop, then reshape. To help me with this I have written two helper functions:

def crop_images(images_data):
    cropped_images = []
    for image_data in images_data:
        image = Image.fromarray(image_data)
        cropped_image = np.asarray(image.crop((25,40,275,120)))
        cropped_images.append(cropped_image)
    return(np.array(cropped_images))

def resize_images(images_data):
    resized_images = []
    width, height = images_data.shape[2], images_data.shape[1]
    resized_width, resized_height = int(width/2), int(height/2)
    for image_data in images_data:
        image = Image.fromarray(image_data)
        image = image.resize((resized_width, resized_height), Image.ANTIALIAS)
        resized_images.append(np.asarray(image))
    return(np.array(resized_images))

Then I would just chain the two functions together to process my images like: resize_images(crop_images(images_data))

But I was wondering whether there is a way to vectorize these operation as I know that numpy should ideally be vectorized operations, as it is faster.

hpaulj · Accepted Answer

This is a higher level of iteration - over image arrays - where the usual talk about 'vectorizing' is not as applicable.

Image arrays tend to have size like (400,400,3) or bigger. You don't want to iterate of one of those 400 sides if you don't have to. So 'vectorizing' operations on image arrays makes a lot of sense.

But if processing 100 of these images, a loop over images isn't so bad. The only way to 'vectorize' is to assemble them into a larger array (N, 400, 400, 3) and find expressions that work on 4d, or slices of that big one. It's tempting to go that route if N is 1000 or more, but for a big array like that memory management issues start chewing into any speed gains.

For iteration, I think appending to list and inserting into a preallocated array are both useful. I haven't seen clear evidence that one is faster than the other in all cases.

alist = []
for arr in source:
    
    alist.append(arr)
bigarr = np.array(alist)

versus

bigarr = np.zeros((N,..)
for i in range(N):
    arr = source[i,...]
    
    bigarr[i,...] = arr

Code clarity can also suffer when trying to 'vectorize' batch operations.

Vectorizing the reshaping and cropping of images using PIL

Answers (2)

Related Questions