Reputation: 1177
I have a function that has to loop through individual pixels of an image and calculate some geometry. This function takes a very long time to run (~5 hours on a 24 Megapixel image) but seems like it should be easy to run in parallel on multiple cores. However, I can't for the life of me find a well documented, well explained example of doing something like this using the Multiprocessing package. Here is the code I am running right now as a toy example:
import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from skimage import color
import multiprocessing
from multiprocessing import Process
#Some dumb stand in function for this exercise
def dumb_func(image):
ny, nx = image.shape
temp = np.empty_like(image)
for y in range(ny):
for x in range(nx):
temp[y, x] = np.square(image[y, x])
return temp
#Convert image to greyscale
img = color.rgb2gray(misc.ascent())
#Resize the image
ns = 2048 #Pixel size
img = misc.imresize(img, size = (ns, ns))
#Split the image into equal chunks...not sure how this works for arrays that
#are weird shapes and aren't the same size in each dimension
divs = 4
init_split = np.array_split(img, divs, axis = 0)
side = init_split[0].shape[0]
chunked = np.empty((divs, divs, side, side))
cur = 0
for i in range(divs):
split = np.array_split(init_split[i], divs, axis = 1)
for j in range(divs):
chunked[i, j, :, :] = split[j]
cur +=1
#Pull core count and divide by two to be safe
cores = int(multiprocessing.cpu_count() / 2)
result = np.empty_like(chunked)
idxs = np.array(np.meshgrid(np.arange(0, divs, 1),
np.arange(0, divs, 1))).T.reshape(-1, 2)
Basically this code loads in an image, converts it to greyscale, makes it bigger, and then chunks it up. The chunked array is of shape (i, j, ny, nx) where i and j are indices that identify the chunk of the image I am working with, and ny,nx describe the size in pixels of each chunk.
Additionally, I am creating an array called idxs that stores all possible indices into the chunked array to pull the chunked images out.
What I want to do is run a function (in this case the dumb_func as an example) over the chunks in parallel and store the results in the results array of the same shape. The way I imagined doing it was to loop over the idxs array and assign processes the chunks belonging to those indexes up to the number of cores, wait for those cores to finish, then feed the cores more processes until finished. I got stuck because I couldn't A) figure out how to access the return value in the function, and B) how to handle a situation where I might have 16 chunks and 5 cores leading to the last iteration only requiring a single process.
How can I go about doing this? I've spent the last 6-7 hours reading about Multiprocessing Pool, Process, Map, Starmap, etc... and can't for the life of me understand how to implement this.
Edit for Reedinationer:
This is my updated code and runs without error. However the new_data array is never updated. I filled it with a value of 100 and at the end of the routine new_data is exactly how it was initialized.
import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from multiprocessing import Process, JoinableQueue
from time import time
#SOme dumb stand in function for this exercise
def dumb_func(q, new_data):
while True:
index, image = q.get()
temp = image **2
new_data[index[0], index[1], :, :] = temp
q.task_done()
if __name__ == "__main__":
start = time()
q = JoinableQueue()
img = misc.ascent()
#Resize the image
ns = 2048 #Pixel size
img = misc.imresize(img, size = (ns, ns))
#Split the image into equal chunks...not sure how this works for arrays that
#are weird shapes and aren't the same size in each dimension
divs = 4
init_split = np.array_split(img, divs, axis = 0)
side = init_split[0].shape[0]
chunked = np.empty((divs, divs, side, side))
cur = 0
for i in range(divs):
split = np.array_split(init_split[i], divs, axis = 1)
for j in range(divs):
chunked[i, j, :, :] = split[j]
cur +=1
new_data = np.full(chunked.shape, 100)
idxs = np.array(np.meshgrid(np.arange(0, divs, 1),
np.arange(0, divs, 1))).T.reshape(-1, 2)
for i in range(len(idxs)):
q.put((idxs[i], chunked[idxs[i][0], idxs[i][1], :, :]))
print ('starting workers')
worker_count = len(idxs)
processes = []
for i in range(worker_count):
p = Process(target=dumb_func, args=[q, new_data])
p.daemon = True
p.start()
print('main thread waiting')
q.join()
end = time()
print('{:.3f} seconds elapsed'.format(end - start))
Upvotes: 2
Views: 3713
Reputation: 5774
I've been working on code for basically this same thing. Right now the goal is just to replace white pixels with transparent ones, but it seems to replace the entire image so there is a bug somewhere...It doesn't get an error within the multiprocessing
module anymore though, so maybe it could serve as an example of how to load a Queue
and then have your worker processes work on it!
from PIL import Image
from multiprocessing import Process, JoinableQueue
from threading import Thread
from time import time
def worker_function(q, new_data):
while True:
# print("Items in queue: {}".format(q.qsize()))
index, pixel = q.get()
if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
out_pixel = (0, 0, 0, 0)
else:
out_pixel = pixel
new_data[index] = out_pixel
q.task_done()
if __name__ == "__main__":
start = time()
q = JoinableQueue()
my_image = Image.open('InputImage.jpg')
my_image = my_image.convert('RGBA')
datas = list(my_image.getdata())
new_data = [0] * len(datas) # make a blank array the size of our image to fill later
print('putting image into queue')
for count, item in enumerate(datas):
q.put((count, item))
print('starting workers')
worker_count = 50
processes = []
for i in range(worker_count):
p = Process(target=worker_function, args=[q, new_data])
p.daemon = True
p.start()
print('main thread waiting')
q.join()
my_image.putdata(new_data)
my_image.save('output.png', "PNG")
end = time()
print('{:.3f} seconds elapsed'.format(end - start))
I think it's important to "protect" your code inside the if __name__ == "__main__"
block otherwise the spawned processes seem to run it.
It looks like you need to implement a Manager()
(or there are probably other ways I am ignorant of as well!). I got my code to run by altering it into:
from PIL import Image
from multiprocessing import Process, JoinableQueue, Manager
from threading import Thread
from time import time
def worker_function(q, new_data):
while True:
# print("Items in queue: {}".format(q.qsize()))
index, pixel = q.get()
if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
out_pixel = (0, 0, 0, 0)
else:
out_pixel = pixel
new_data[index] = out_pixel
q.task_done()
if __name__ == "__main__":
start = time()
q = JoinableQueue()
my_image = Image.open('InputImage.jpg')
my_image = my_image.convert('RGBA')
datas = list(my_image.getdata())
# new_data = [(0, 0, 0, 0)]*len(datas)
manager = Manager()
new_data = manager.list([(0, 0, 0, 0)]*len(datas))
print(new_data)
print('putting image into queue')
for count, item in enumerate(datas):
q.put((count, item))
print('starting workers')
worker_count = 50
processes = []
for i in range(worker_count):
p = Process(target=worker_function, args=[q, new_data])
p.daemon = True
p.start()
print('main thread waiting')
q.join()
print("Saving Image")
my_image.putdata(new_data)
my_image.save('output.png', "PNG")
end = time()
print('{:.3f} seconds elapsed'.format(end - start))
Although this doesn't seem like the fastest option! I'm sure there are other ways to increase speed. My code to do the same thing with Thread
s looks VERY similar:
from PIL import Image
from threading import Thread
from queue import Queue
import time
start = time.time()
q = Queue()
planeIm = Image.open('InputImage.jpg')
planeIm = planeIm.convert('RGBA')
datas = planeIm.getdata()
new_data = [0] * len(datas)
print('putting image into queue')
for count, item in enumerate(datas):
q.put((count, item))
def worker_function():
while True:
# print("Items in queue: {}".format(q.qsize()))
index, pixel = q.get()
if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
out_pixel = (0, 0, 0, 0)
else:
out_pixel = pixel
new_data[index] = out_pixel
q.task_done()
print('starting workers')
worker_count = 100
for i in range(worker_count):
t = Thread(target=worker_function)
t.daemon = True
t.start()
print('main thread waiting')
q.join()
print('Queue has been joined')
planeIm.putdata(new_data)
planeIm.save('output.png', "PNG")
end = time.time()
elapsed = end - start
print('{:3.3} seconds elapsed'.format(elapsed))
Yet, processing my image takes ~23 seconds with threads and ~170 seconds with multiprocessing!! I suspect this would come from the larger overhead needed to start Process
objects, and the fact that my algorithm for processing each pixel is simple for now (just the if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
bit), so I'm likely not yielding the speed improvements that a complex pixel processing algorithm would get me. Also to note multiprocessing documentation
a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory.
Which leads me to believe that there are alternatives that are faster.
Upvotes: 1
Reputation: 16194
I'd do something like this, starting with dependencies:
from multiprocessing import Pool
import numpy as np
from PIL import Image
# and some for testing
from random import random
from time import sleep
first I define a function to divide an image up into "chunks", sort of as you talked about:
def chunkit(ys, xs, blocksize=64):
for y in range(0, ys, blocksize):
yt = (y, min(ys, y + blocksize))
for x in range(0, xs, blocksize):
xt = (x, min(xs, x + blocksize))
yield yt, xt
it's a lazy iterator, so this can go on for a while.
I then define my worker function:
def dumb_func(cc):
(y0,y1), (x0,x1) = cc
# convert to floats for ease of processing
chunk = image[y0:y1,x0:x1] / 255.
# random slow down for testing
# sleep(random() ** 6)
res = chunk ** 2
# convert back to bytes for efficiency
return cc, (res * 255).astype(np.uint8)
I make sure the source array stays as close to original format as possible for efficiency and send it back in the same format (this might take some fiddling, if you're dealing with other pixel formats obviously).
then I put it together:
if __name__ == '__main__':
source = Image.open('tmp.jpeg')
image = np.asarray(source)
print("loaded", image.shape, image.dtype)
with Pool() as pool:
resit = pool.imap_unordered(
dumb_func, chunkit(*image.shape[:2]))
output = np.empty_like(image)
for cc, res in resit:
(y0,y1), (x0,x1) = cc
output[y0:y1,x0:x1] = res
im = Image.fromarray(output, 'RGB')
im.save('out.jpeg')
this churns through a 15Mpixel image in a couple of seconds, with most of that spent loading/saving the image. it could probably be a lot more clever with array strides and cache friendliness, but hope that helps!
note: I think this code relies on CPython Unix style process forking semantics to make sure the image is shared between processes efficiently. not sure what would happen if you ran it on something else
Upvotes: 3