Multiprocessing on chunks of an image

Question

I have a function that has to loop through individual pixels of an image and calculate some geometry. This function takes a very long time to run (~5 hours on a 24 Megapixel image) but seems like it should be easy to run in parallel on multiple cores. However, I can't for the life of me find a well documented, well explained example of doing something like this using the Multiprocessing package. Here is the code I am running right now as a toy example:

import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from skimage import color
import multiprocessing 
from multiprocessing import Process

#Some dumb stand in function for this exercise
def dumb_func(image):
    ny, nx = image.shape
    temp = np.empty_like(image)

    for y in range(ny):
        for x in range(nx):
            temp[y, x] = np.square(image[y, x])

    return temp

#Convert image to greyscale
img = color.rgb2gray(misc.ascent())

#Resize the image
ns = 2048 #Pixel size
img = misc.imresize(img, size = (ns, ns))


#Split the image into equal chunks...not sure how this works for arrays that
#are weird shapes and aren't the same size in each dimension

divs = 4
init_split = np.array_split(img, divs, axis = 0)
side = init_split[0].shape[0]
chunked = np.empty((divs, divs, side, side))
cur = 0
for i in range(divs):
    split = np.array_split(init_split[i], divs, axis = 1)
    for j in range(divs):
        chunked[i, j, :, :] = split[j]
        cur +=1

#Pull core count and divide by two to be safe
cores = int(multiprocessing.cpu_count() / 2)

result = np.empty_like(chunked)
idxs = np.array(np.meshgrid(np.arange(0, divs, 1), 
                            np.arange(0, divs, 1))).T.reshape(-1, 2)

Basically this code loads in an image, converts it to greyscale, makes it bigger, and then chunks it up. The chunked array is of shape (i, j, ny, nx) where i and j are indices that identify the chunk of the image I am working with, and ny,nx describe the size in pixels of each chunk.

Additionally, I am creating an array called idxs that stores all possible indices into the chunked array to pull the chunked images out.

What I want to do is run a function (in this case the dumb_func as an example) over the chunks in parallel and store the results in the results array of the same shape. The way I imagined doing it was to loop over the idxs array and assign processes the chunks belonging to those indexes up to the number of cores, wait for those cores to finish, then feed the cores more processes until finished. I got stuck because I couldn't A) figure out how to access the return value in the function, and B) how to handle a situation where I might have 16 chunks and 5 cores leading to the last iteration only requiring a single process.

How can I go about doing this? I've spent the last 6-7 hours reading about Multiprocessing Pool, Process, Map, Starmap, etc... and can't for the life of me understand how to implement this.

Edit for Reedinationer:

This is my updated code and runs without error. However the new_data array is never updated. I filled it with a value of 100 and at the end of the routine new_data is exactly how it was initialized.

import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from multiprocessing import Process, JoinableQueue
from time import time

#SOme dumb stand in function for this exercise
def dumb_func(q, new_data):
    while True:
        index, image = q.get()
        temp = image **2

        new_data[index[0], index[1], :, :] = temp
        q.task_done()

if __name__ == "__main__":
    start = time()
    q = JoinableQueue()
    img = misc.ascent()
    #Resize the image
    ns = 2048 #Pixel size
    img = misc.imresize(img, size = (ns, ns))
    #Split the image into equal chunks...not sure how this works for arrays that
    #are weird shapes and aren't the same size in each dimension

    divs = 4
    init_split = np.array_split(img, divs, axis = 0)
    side = init_split[0].shape[0]
    chunked = np.empty((divs, divs, side, side))
    cur = 0
    for i in range(divs):
        split = np.array_split(init_split[i], divs, axis = 1)
        for j in range(divs):
            chunked[i, j, :, :] = split[j]
            cur +=1

    new_data = np.full(chunked.shape, 100)
    idxs = np.array(np.meshgrid(np.arange(0, divs, 1), 
                                np.arange(0, divs, 1))).T.reshape(-1, 2)

    for i in range(len(idxs)):
        q.put((idxs[i], chunked[idxs[i][0], idxs[i][1], :, :]))

    print ('starting workers')

    worker_count = len(idxs)
    processes = []
    for i in range(worker_count):
        p = Process(target=dumb_func, args=[q, new_data])
        p.daemon = True
        p.start()
    print('main thread waiting')
    q.join()

    end = time()
    print('{:.3f} seconds elapsed'.format(end - start))

Reedinationer · Accepted Answer

I've been working on code for basically this same thing. Right now the goal is just to replace white pixels with transparent ones, but it seems to replace the entire image so there is a bug somewhere...It doesn't get an error within the multiprocessing module anymore though, so maybe it could serve as an example of how to load a Queue and then have your worker processes work on it!

from PIL import Image
from multiprocessing import Process, JoinableQueue
from threading import Thread
from time import time

def worker_function(q, new_data):
    while True:
        # print("Items in queue: {}".format(q.qsize()))
        index, pixel = q.get()
        if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
            out_pixel = (0, 0, 0, 0)
        else:
            out_pixel = pixel
        new_data[index] = out_pixel
        q.task_done()

if __name__ == "__main__":
    start = time()
    q = JoinableQueue()

    my_image = Image.open('InputImage.jpg')
    my_image = my_image.convert('RGBA')
    datas = list(my_image.getdata())
    new_data = [0] * len(datas) # make a blank array the size of our image to fill later

    print('putting image into queue')
    for count, item in enumerate(datas):
        q.put((count, item))

    print('starting workers')
    worker_count = 50
    processes = []
    for i in range(worker_count):
        p = Process(target=worker_function, args=[q, new_data])
        p.daemon = True
        p.start()
    print('main thread waiting')
    q.join()
    my_image.putdata(new_data)
    my_image.save('output.png', "PNG")

    end = time()
    print('{:.3f} seconds elapsed'.format(end - start))

I think it's important to "protect" your code inside the if __name__ == "__main__" block otherwise the spawned processes seem to run it.

update

It looks like you need to implement a Manager() (or there are probably other ways I am ignorant of as well!). I got my code to run by altering it into:

from PIL import Image
from multiprocessing import Process, JoinableQueue, Manager
from threading import Thread
from time import time


def worker_function(q, new_data):
    while True:
        # print("Items in queue: {}".format(q.qsize()))
        index, pixel = q.get()
        if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
            out_pixel = (0, 0, 0, 0)
        else:
            out_pixel = pixel
        new_data[index] = out_pixel
        q.task_done()


if __name__ == "__main__":
    start = time()
    q = JoinableQueue()
    my_image = Image.open('InputImage.jpg')
    my_image = my_image.convert('RGBA')
    datas = list(my_image.getdata())
    # new_data = [(0, 0, 0, 0)]*len(datas)
    manager = Manager()
    new_data = manager.list([(0, 0, 0, 0)]*len(datas))
    print(new_data)
    print('putting image into queue')
    for count, item in enumerate(datas):
        q.put((count, item))

    print('starting workers')
    worker_count = 50
    processes = []
    for i in range(worker_count):
        p = Process(target=worker_function, args=[q, new_data])
        p.daemon = True
        p.start()
    print('main thread waiting')
    q.join()
    print("Saving Image")
    my_image.putdata(new_data)
    my_image.save('output.png', "PNG")

    end = time()
    print('{:.3f} seconds elapsed'.format(end - start))

Although this doesn't seem like the fastest option! I'm sure there are other ways to increase speed. My code to do the same thing with Threads looks VERY similar:

from PIL import Image
from threading import Thread
from queue import Queue
import time

start = time.time()
q = Queue()

planeIm = Image.open('InputImage.jpg')
planeIm = planeIm.convert('RGBA')
datas = planeIm.getdata()
new_data = [0] * len(datas)

print('putting image into queue')
for count, item in enumerate(datas):
    q.put((count, item))

def worker_function():
    while True:
        # print("Items in queue: {}".format(q.qsize()))
        index, pixel = q.get()
        if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
            out_pixel = (0, 0, 0, 0)
        else:
            out_pixel = pixel
        new_data[index] = out_pixel
        q.task_done()

print('starting workers')
worker_count = 100
for i in range(worker_count):
    t = Thread(target=worker_function)
    t.daemon = True
    t.start()
print('main thread waiting')
q.join()
print('Queue has been joined')
planeIm.putdata(new_data)
planeIm.save('output.png', "PNG")

end = time.time()

elapsed = end - start
print('{:3.3} seconds elapsed'.format(elapsed))

Yet, processing my image takes ~23 seconds with threads and ~170 seconds with multiprocessing!! I suspect this would come from the larger overhead needed to start Process objects, and the fact that my algorithm for processing each pixel is simple for now (just the if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240: bit), so I'm likely not yielding the speed improvements that a complex pixel processing algorithm would get me. Also to note multiprocessing documentation

a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory.

Which leads me to believe that there are alternatives that are faster.

Multiprocessing on chunks of an image

Answers (2)

update

Related Questions