Reputation: 123
In the example code below, I was trying to adapt the accepted answer in this thread. The goal is to use multi-processing to generate independent random normal numbers (in the example below I just want 3 random numbers). This is a baby version of any more complicated code where some random number generator is used in defining the trial function.
Example Code
import multiprocessing
def trial(procnum, return_dict):
p = np.random.randn(1)
num = procnum
return_dict[procnum] = p, num
if __name__ == '__main__':
manager = multiprocessing.Manager()
return_dict = manager.dict()
jobs = []
for i in range(5):
p = multiprocessing.Process(target=trial, args=(i,return_dict))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
print(return_dict.values())
However, the output gives me the same random number every time, rather than an independent random number for each entry in return_dict.
Output
[(array([-1.08817286]), 0), (array([-1.08817286]), 1), (array([-1.08817286]), 2)]
I feel like this is a really silly mistake. Can someone expose my silliness please :)
Upvotes: 2
Views: 330
Reputation: 70582
Just adding a gloss to @Aziz Sonawalla's answer: why does this work?
Because Python's random
module works differently. On Windows, multiprocessing spawns new processes, and each is a freshly created instance that does its own from-scratch seeding from OS sources of entropy.
On Linux, by default multiprocessing uses fork()
to create new processes, and those inherit the entire state of the main process, in copy-on-write mode. That includes the state of the random number generator. So you would get the same random numbers across worker processes from Python too, except that, at least since Python 3.7, Python explicitly (but under the covers - invisibly) re-seeds its random number generator after fork()
.
I'm not sure when, but for some time before 3.7 the multiprocessing Process
implementation also re-seeded Python's generator in child processes it created via fork()
(but Python itself did not if you called fork()
yourself).
All of which is just to explain why calling Python's random.randrange()
returns different results in different worker processes. That's why it's an effective way to generate differing seeds for numpy
to use in this context.
Upvotes: 2
Reputation: 2502
It's not a silly mistake, and it has to do with the way numpy
does the staging across cores. Read more here: https://discuss.pytorch.org/t/why-does-numpy-random-rand-produce-the-same-values-in-different-cores/12005
But the solution is to give numpy
a random seed from a large range:
import multiprocessing
import numpy as np
import random
def trial(procnum, return_dict):
np.random.seed(random.randint(0,100000))
p = np.random.randn()
return_dict[procnum] = p
if __name__ == '__main__':
manager = multiprocessing.Manager()
return_dict = manager.dict()
jobs = []
for i in range(3):
p = multiprocessing.Process(target=trial, args=(i,return_dict))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
print(return_dict.values())
Upvotes: 3