Reputation: 1025
According to this answer, it isn't. But this has not been consistend with what I've observed so far. Consider the following script:
import numpy as np
from multiprocessing.dummy import Pool
from queue import Queue
SIZE=1000000
np.random.seed(1)
tPool = Pool(100)
q1 = Queue()
def worker_thread(i):
q1.put(np.random.choice(100, 5))
tPool.map(worker_thread, range(SIZE))
q2 = Queue()
np.random.seed(1)
for i in range(SIZE):
q2.put(np.random.choice(100, 5))
n = 0
for i in range(SIZE):
n += (q1.get() == (q2.get()))
print(n)
Basically what I'm testing here is if SIZE number of calls will generate the same sequence in the multi-threaded environment as in the single-threaded environment. For me this will output n=SIZE. Of course this could be just chance, so I ran it a few times and been having consistent results. So my question is, are calls to functions of the numpy.random package thread-safe?
Upvotes: 2
Views: 1489
Reputation: 53788
I've run your script several times on my machine and got arrays of 999995
, 999992
nearly as often as 1000000
(python 3.5.2, numpy 1.13.3). So the answer you're referring to is correct: np.random
may produce a different result in multi-threaded environment.
You can see it yourself if you increase the pool size, say to 1000
, and sample size, say to 50
. I was able to achieve 100% inconsistency even for a smaller SIZE=100000
.
Upvotes: 2