Reputation: 101
I am working in a Jupyter notebook. I'm new to multiprocessing in python, and I'm trying to parallelize the calculation of a function for a grid of parameters. Here is a snippet of code quite representative of what I'm doing:
import os
import numpy as np
from concurrent.futures import ProcessPoolExecutor
def f(x,y):
print(os.getpid(), x,y,x+y)
return x+y
xs = np.linspace(5,7,3).astype(int)
ys = np.linspace(1,3,3).astype(int)
func = lambda p: f(*p)
with ProcessPoolExecutor() as executor:
args = (arg for arg in zip(xs,ys))
results = executor.map(func, args)
for res in results:
print(res)
The executor doesn't even start.
No problem whatsoever if I serially execute the same with, e.g. list comprehension,
args = (arg for arg in zip(xs,ys))
results = [func(arg) for arg in args]
Upvotes: 1
Views: 2089
Reputation: 139
Are you running on Windows? I think your main problem is that each process is trying to re-execute your whole script, so you should include an if name == "main" check. I think you have a second issue trying to use a lambda function that can't be pickled, since the processes communicate by pickling the data. There are work-arounds for that but in this case it looks like you don't really need the lambda. Try something like this:
import os
import numpy as np
from concurrent.futures import ProcessPoolExecutor
def f(x, y):
print(os.getpid(), x, y, x + y)
return x + y
if __name__ == '__main__':
xs = np.linspace(5, 7, 3).astype(int)
ys = np.linspace(1, 3, 3).astype(int)
with ProcessPoolExecutor() as executor:
results = executor.map(f, xs, ys)
for res in results:
print(res)
Upvotes: 2