multiprocessing.Pool.map does not work in parallel

Question

What i am trying to achieve:

Parallelize a function that spawn a number of threads per call, like this:

 - PROCESS01 -> 16 Threads
 - PROCESS02 -> 16 Threads
 - ...
 - PROCESSn -> 16 Threads

The code:

with multiprocessing.Pool(4) as process_pool:
    results = process_pool.map(do_stuff, [drain_queue()])

Where drain_queue() return a list of items and

do_stuff(item_list):
    print('> PID: ' + str(os.getpid()))
    with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor:
        result_dict = {executor.submit(thread_function, item): item for item in item_list}
        for future in concurrent.futures.as_completed(result_dict):
            pass

And thread_function() process every item passed to it.

However, when executed the code outputs like this:

> PID: 1000
(WAITS UNTIL THE PROCESS FINISHES, THEN START NEXT)
> PID: 2000
(WAITS UNTIL THE PROCESS FINISHES, THEN START NEXT)
> PID: 3000
(WAITS UNTIL THE PROCESS FINISHES, THEN START NEXT)
> PID: 3000
(WAITS UNTIL THE PROCESS FINISHES, THEN START NEXT)

Here is a screenshot of Task Manager

What am i missing here? I can't figure it out why does not work as expected. Thanks!

Temperosa · Accepted Answer

I've found the problem. The second argument of map() is expected to be an iterable, where in my case was a list containing a single object.

What is wrong ? This: [drain_queue()] , which produces a list with a single object in it.

In this case, the code

with multiprocessing.Pool(4) as process_pool:
    results = process_pool.map(do_stuff, [drain_queue()])

forces multiprocessing.Pool.map to "distribute" a single object to a single process, even though it creates n number of processes, the work will still be done by one process. Thankfully nothing to to with GIL limitations.

multiprocessing.Pool.map does not work in parallel

Answers (1)

Related Questions