numpy.random Seed in multiprocessing

Question

I have a distributed process of a random process. Therefor I use the numpy.random.RandomState to seed the numbers. The problem is that I have to use another numpy.random function inside my wrapper. Now I am losing the reproducibility of the seed because I cant control the order of the function calls.

A short version of this problem would be:

import numpy as np
import multiprocessing 

def function(N):
    return RDS.choice(range(N))

def wrapper(ic):
    return ic,function(ic)

RDS = np.random.RandomState(0)   

inputlist = []   
for i in range(30):
   inputlist.append((RDS.randint(1,100),))

pool = multiprocessing.Pool(4)

solutions_list = pool.starmap(wrapper, inputlist) 

pool.close() 
pool.join()

print(solutions_list)

I can not run function(ic) outside of wrapper because in my code it further depends on calculation results.

Is there another way to set the seed properly?

user2357112 · Accepted Answer

Setting the seed differently isn't going to solve your reproducibility problem. (It'd solve another problem we'll get to later, but it won't solve the reproducibility problem.) Your reproducibility issue comes from the nondeterministic assignment of tasks to workers, which is not controlled by any random seed.

To solve the reproducibility issue, you need to assign tasks deterministically. One way to do that would be to abandon the use of the process pool and assign jobs to processes manually.

The other problem is that your workers are all inheriting the same random seed. (They don't share the same RDS object - this isn't threading - but their copies of RDS are initialized identically.) This can lead to them producing identical or extremely correlated output, ruining your results. To fix this, each worker should reseed RDS to a distinct seed on startup.

numpy.random Seed in multiprocessing

Answers (1)

Related Questions