Reputation: 265
I'm new on python. I want to learn how to parallel processing in python. I saw the following example:
import multiprocessing as mp
np.random.RandomState(100)
arr = np.random.randint(0, 10, size=[20, 5])
data = arr.tolist()
def howmany_within_range_rowonly(row, minimum=4, maximum=8):
count = 0
for n in row:
if minimum <= n <= maximum:
count = count + 1
return count
pool = mp.Pool(mp.cpu_count())
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
print(results[:10])
but when I run it, this error happened:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
What should I do?
Upvotes: 15
Views: 11007
Reputation: 346
I had a live example, where I faced the same RuntimeError issue when I executed a specific tool on MacOS-machines (on Linux machines it was fine though). However, I'm not sure about the exact cause for the problem, cause the if __name__ == "__main__"
encapsulation seemed to be properly at place.
Following one comment on this Stack-Overflow entry, I suspected that using python>=3.8
, which utilizes spawn
as default method for calling subprocesses might be the problem.
My solution:
Using python=3.7
did the trick.
Upvotes: 1
Reputation: 4471
If you place everything in global scope inside this if __name__ == "__main__"
block as follows, you should find that your program behaves as you expect:
def howmany_within_range_rowonly(row, minimum=4, maximum=8):
count = 0
for n in row:
if minimum <= n <= maximum:
count = count + 1
return count
if __name__ == "__main__":
np.random.RandomState(100)
arr = np.random.randint(0, 10, size=[20, 5])
data = arr.tolist()
pool = mp.Pool(mp.cpu_count())
results = pool.map(howmany_within_range_rowonly, [row for row in data])
pool.close()
print(results[:10])
Without this protection, if your current module was imported from a different module, your multiprocessing code would be executed. This could occur within a non-main process spawned in another Pool and spawning processes from sub-processes is not allowed, hence we protect against this problem.
Upvotes: 19