Reputation: 168
What i am trying to achieve:
Parallelize a function that spawn a number of threads per call, like this:
- PROCESS01 -> 16 Threads
- PROCESS02 -> 16 Threads
- ...
- PROCESSn -> 16 Threads
The code:
with multiprocessing.Pool(4) as process_pool:
results = process_pool.map(do_stuff, [drain_queue()])
Where drain_queue()
return a list of items and
do_stuff(item_list):
print('> PID: ' + str(os.getpid()))
with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor:
result_dict = {executor.submit(thread_function, item): item for item in item_list}
for future in concurrent.futures.as_completed(result_dict):
pass
And thread_function()
process every item passed to it.
However, when executed the code outputs like this:
> PID: 1000
(WAITS UNTIL THE PROCESS FINISHES, THEN START NEXT)
> PID: 2000
(WAITS UNTIL THE PROCESS FINISHES, THEN START NEXT)
> PID: 3000
(WAITS UNTIL THE PROCESS FINISHES, THEN START NEXT)
> PID: 3000
(WAITS UNTIL THE PROCESS FINISHES, THEN START NEXT)
Here is a screenshot of Task Manager
What am i missing here? I can't figure it out why does not work as expected. Thanks!
Upvotes: 3
Views: 4825
Reputation: 168
I've found the problem. The second argument of map()
is expected to be an iterable, where in my case was a list containing a single object.
What is wrong ? This: [drain_queue()]
, which produces a list with a single object in it.
In this case, the code
with multiprocessing.Pool(4) as process_pool:
results = process_pool.map(do_stuff, [drain_queue()])
forces multiprocessing.Pool.map
to "distribute" a single object to a single process, even though it creates n
number of processes, the work will still be done by one process. Thankfully nothing to to with GIL limitations.
Upvotes: 6