Reputation: 3221
I have a basic multiprocessing class which takes some parameters and sends them off to a worker:
class Multi(object):
def __init__(self, pool_parameters, pool_size):
self.pool_parameters = pool_parameters # Parameters in a tuple
self.pool_size = pool_size
self.pool = mp.Pool(self.pool_size)
self.results = \
[self.pool.apply_async(worker, args=((self.pool_parameters[i]),),)
for i in range(self.pool_size)]
time1 = time.time()
self.output = [r.get() for r in self.results] # Output objects in here
print time.time() - time1
def worker(*args):
# Do stuff
return stuff
However the r.get() line seems to take ages. If I have a pool_size of 1, the worker returns its result in 0.1 seconds, but the r.get() line takes another 1.35 seconds. Why does it take so long, especially if only one process is started?
EDIT: For a single process and using the worker to return a single None value, the self.output line still takes 1.3 seconds on my system (using time.time() to time that line)
EDIT2: Sorry, I found the problem and I dont think it is to do with multiprocessing. The problem seems to come from importing various other modules. When I got rid my imports the time was 0.1 seconds. No idea why though...
Upvotes: 3
Views: 1900
Reputation: 94951
You're seeing poor performance because you're sending a large object between the processes. Pickling the object in the child, sending those bytes between processes, and then unpickling them in parent, takes a non-trivial amount of time. This is one of the reasons the best practices for multiprocessing
suggests avoiding large amounts of shared state:
Avoid shared state
As far as possible one should try to avoid shifting large amounts of data between processes.
You'll probably be able to isolate this behavior if you call pickle.loads(pickle.dumps(obj))
on your object. I would expect it to take almost as long as the get()
call.
Upvotes: 4