Reputation: 827
I have a distributed process of a random process. Therefor I use the numpy.random.RandomState
to seed the numbers.
The problem is that I have to use another numpy.random
function inside my wrapper. Now I am losing the reproducibility of the seed because I cant control the order of the function calls.
A short version of this problem would be:
import numpy as np
import multiprocessing
def function(N):
return RDS.choice(range(N))
def wrapper(ic):
return ic,function(ic)
RDS = np.random.RandomState(0)
inputlist = []
for i in range(30):
inputlist.append((RDS.randint(1,100),))
pool = multiprocessing.Pool(4)
solutions_list = pool.starmap(wrapper, inputlist)
pool.close()
pool.join()
print(solutions_list)
I can not run function(ic)
outside of wrapper because in my code it further depends on calculation results.
Is there another way to set the seed properly?
Upvotes: 2
Views: 831
Reputation: 280251
Setting the seed differently isn't going to solve your reproducibility problem. (It'd solve another problem we'll get to later, but it won't solve the reproducibility problem.) Your reproducibility issue comes from the nondeterministic assignment of tasks to workers, which is not controlled by any random seed.
To solve the reproducibility issue, you need to assign tasks deterministically. One way to do that would be to abandon the use of the process pool and assign jobs to processes manually.
The other problem is that your workers are all inheriting the same random seed. (They don't share the same RDS
object - this isn't threading - but their copies of RDS
are initialized identically.) This can lead to them producing identical or extremely correlated output, ruining your results. To fix this, each worker should reseed RDS
to a distinct seed on startup.
Upvotes: 1