numpy.random.normal() in multithreaded environment

Question

I was trying to parallelize the generation of normally-distributed random arrays with the numpy.random.normal() function but it looks like the calls from the threads are executed sequentially instead.

import numpy as np

start = time.time()

active_threads = 0
for i in range(100000):
    t = threading.Thread(target=lambda x : np.random.normal(0,2,4000), args = [0])
    t.start()

    while active_threads >= 12:
        time.sleep(0.1)
        continue

end = time.time()
print(str(end-start))

If i measure the time for a 1 thread process i get the same result as the 12 thread one. I know this kind of parallelization suffers from a lot of overhead, but even then it should still get some time off with the multithread version.

J&#233;r&#244;me Richard · Accepted Answer

np.random.normal use a seed variable internally. This variable is retrieved from default_rng() and it is certainly shared between thread so it is not safe to call it using multiple threads (due to possible race conditions). In fact, the documentation provides examples for such a case (see here and there). Alternatively, you can use multiple processes (you need to configure the seed to get different results in different processes). Another solution is to use custom random number generators (RNG) so to use a different RNG object in each thread.

numpy.random.normal() in multithreaded environment

Answers (1)

Related Questions