user3601462
user3601462

Reputation: 23

python multiprocessing pool not work when we run many tasks

Example: I have a CPU with 2 threads so I have 2 workers with this code.

tasks = ['1.txt', '2.txt', '3.txt', '4.txt', '5.txt']

pool = multiprocessing.Pool()
pool.map(myfunc, tasks, chunksize=1)
pool.close()
pool.join()

If I run this program it will process '1.txt' and '2.txt' first and then when one of them done the job it will start next file. That mean at a time it process only 2 files, right?

But I found the problem when I run with many many files. (maybe more than 100+ files). Program will not wait until one of 2 workers done the job but it will assign job to worker 3, 4, 5, 6, 7, 8, 9, and so on.

How can I fix this problem?

Thank you everyone in advance.

P.S. I use Python 3.6.

Upvotes: 1

Views: 2165

Answers (1)

abc
abc

Reputation: 11929

You can specify the number of worker processes in the pool passing it as an argument to multiprocessing.Pool().

Example:

import multiprocessing
import time

def myfunc(t):
    print("{} starts".format(t))
    time.sleep(1)
    print("{} ends".format(t))

tasks = ['1.txt', '2.txt', '3.txt', '4.txt', '5.txt']

pool = multiprocessing.Pool(processes=2)
pool.map(myfunc, tasks, chunksize=1)
pool.close()
pool.join()

which on my machine outputs

1.txt starts
2.txt starts
1.txt ends
3.txt starts
2.txt ends
4.txt starts
3.txt ends
5.txt starts
4.txt ends
5.txt ends

while not specifying the number of worker processes what I get is:

1.txt starts
2.txt starts
3.txt starts
4.txt starts
3.txt ends
1.txt ends
2.txt ends
4.txt ends
5.txt starts
5.txt ends

Upvotes: 1

Related Questions