fetcher
fetcher

Reputation: 83

python multiprocessing why much slower

For a map task from a list src_list to dest_list, len(src_list) is of the level of thousands:

def my_func(elem):
    # some complex work, for example a minimizing task
    return new_elem

dest_list[i] = my_func(src_list[i])

I use multiprocessing.Pool

pool = Pool(4)
# took 543 seconds
dest_list = list(pool.map(my_func, src_list, chunksize=len(src_list)/8))

# took 514 seconds
dest_list = list(pool.map(my_func, src_list, chunksize=4))

# took 167 seconds
dest_list = [my_func(elem) for elem in src_list]

I am confused. Can someone explain why the multiprocessing version runs even slower?

And I wonder what are the considerations to the choice of chunksize and the choice between multi-threads and multi-processes, especially for my problem. Also, currently, I measure time by sum all time spent in the my_func method because directly using

t = time.time()
dest_list = pool.map...
print time.time() - t

doesn't work. However, in here, the document says map() blocks until the result is ready, it seems different to my result. Is there another way rather than simply sum the time? I have tried pool.close() with pool.join() which does not work.

src_list is of length around 2000. time.time()-t doesn't work because it does not sum up all the time spent in my_func in pool.map. And strange thing happended when I used timeit.

def wrap_func(src_list):
    pool = Pool(4)
    dest_list = list(pool.map(my_func, src_list, chunksize=4))

print timeit("wrap_func(src_list)", setup="import ...")

It ran into

OS Error Cannot allocate memory

guess I have used timeit in a wrong way...

I use python 2.7.6 under Ubuntu 14.04.

Thanks!

Upvotes: 1

Views: 6640

Answers (1)

Michael
Michael

Reputation: 13914

Multiprocessing requires overhead to pass the data between processes because processes do not share memory. Any object passed between processes must be pickled (represented as a string) and depickled. This includes objects passed to the function in you list src_list and any object returned to dest_list. This takes time. To illustrate this you might try timing the following function in a single process and in parallel.

def NothingButAPickle(elem):
    return elem

If you loop over your src_list in a single process this should be extremely fast because Python only has to make one copy of each object in the list in memory. If instead you call this function in parallel with the multiprocessing package it has to (1) pickle each object to send it from the main process to a subprocess as a string (2) depickle each object in the subprocess to go from a string representation to an object in memory (3) pickle the object to return it to the main process represented as a string, and then (4) depickle the object to represent it in memory in the main process. Without seeing your data or the actual function, this overhead cost typically only exceeds the multiprocessing gains if the objects you are passing are extremely large and/or the function is actually not that computationally intensive.

Upvotes: 6

Related Questions