Calculate mean in Monte Carlo Simulation using python multiprocessing

Question

I have been reading about multiprocessing in Python (e.g. I have read this and this and this and this and so on; I have also read/watched different websites/videos such as this and this and this and so many more!) but I am still confused how I could apply multiprocessing to my specific problem. I have written a simple example code for calculating the avg value of randomly generated integers using Monte Carlo Simulation (I store the random integers in a variable called integers so I can finally calculate the mean; I am also generating random numpy.ndarrays and store them in a variable called arrays as I need to do some post-processing on those arrays later too):

import numpy as np

nMCS = 10 ** 8

integers = []
arrays = []
for i in range(nMCS):
    a = np.random.randint(0,10)
    b = np.random.rand(10,2)

    integers.append(a)
    arrays.append(b)

mean_val = np.average(integers)
# I will do post-processing on 'arrays' later!!

Now I want to utilize all of the 16 cores on my machine, so the random numbers/arrays are not generated in sequence and I can speed up the process. Based on what I have learnt, I recognize I need to store the results of each Monte Carlo Simulation (i.e. the generated random integer and random numpy.ndarray) and then use Inter-process communication in order to later store all of the results in a list. I have written different codes but unfortunately non of them work. As an example, when I write something like this:

import numpy as np
import multiprocessing

nMCS = 10 ** 6

integers = []
arrays = []

def monte_carlo():
    a = np.random.randint(0,10)
    b = np.random.rand(10,2)

if __name__ == '__main__':
    __spec__ = "ModuleSpec(name='builtins', loader=)" # this is because I am using Spyder!

    p1 = multiprocessing.Process(target = monte_carlo)

    p1.start()

    p1.join()

    for i in range(nMCS):

        integers.append(a)
        arrays.append(b)

I get the error "name 'a' is not defined". So could anyone please help me with this and tell me how I could generate as many random integers/arrays as possible concurrently, and then add them all to a list for further processing?

MatrixTXT · Accepted Answer

Due to the fact that returning a lot of result causes time for propagation between process, I would suggest to divide the task in few part and process it before returning back.

n = 4
def monte_carlo():
    raw_result = []
    for j in range(10**4 / n):
        a = np.random.randint(0,10)
        b = np.random.rand(10,2)
        raw_result .append([a,b])
    result = processResult(raw_result) 
    #Your method to reduce the result return, 
    #let's assume the return value is [avg(a),reformed_array(b)]
    return result

if __name__ == '__main__':
    __spec__ = "ModuleSpec(name='builtins', loader=)" # this is because I am using Spyder!

    pool = Pool(processes=4) 
    #you can control how many processes here, for example multiprocessing.cpu_count()-1 to avoid completely blocking

    multiple_results = [pool.apply_async(monte_carlo, (i,)) for i in range(n)]

    data = [res.get() for res in multiple_results]
    #OR
    data = pool.map(monte_carlo, [i for i in range(n)])
    #Both return you a list of [avg(a),reformed_array(b)]

Calculate mean in Monte Carlo Simulation using python multiprocessing

Answers (2)

Related Questions