Reputation: 548
I have been reading about multiprocessing in Python (e.g. I have read this and this and this and this and so on; I have also read/watched different websites/videos such as this and this and this and so many more!) but I am still confused how I could apply multiprocessing to my specific problem. I have written a simple example code for calculating the avg value of randomly generated integers using Monte Carlo Simulation (I store the random integers in a variable called integers
so I can finally calculate the mean; I am also generating random numpy.ndarrays and store them in a variable called arrays
as I need to do some post-processing on those arrays later too):
import numpy as np
nMCS = 10 ** 8
integers = []
arrays = []
for i in range(nMCS):
a = np.random.randint(0,10)
b = np.random.rand(10,2)
integers.append(a)
arrays.append(b)
mean_val = np.average(integers)
# I will do post-processing on 'arrays' later!!
Now I want to utilize all of the 16 cores on my machine, so the random numbers/arrays are not generated in sequence and I can speed up the process. Based on what I have learnt, I recognize I need to store the results of each Monte Carlo Simulation (i.e. the generated random integer and random numpy.ndarray) and then use Inter-process communication in order to later store all of the results in a list. I have written different codes but unfortunately non of them work. As an example, when I write something like this:
import numpy as np
import multiprocessing
nMCS = 10 ** 6
integers = []
arrays = []
def monte_carlo():
a = np.random.randint(0,10)
b = np.random.rand(10,2)
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
p1 = multiprocessing.Process(target = monte_carlo)
p1.start()
p1.join()
for i in range(nMCS):
integers.append(a)
arrays.append(b)
I get the error "name 'a' is not defined". So could anyone please help me with this and tell me how I could generate as many random integers/arrays as possible concurrently, and then add them all to a list for further processing?
Upvotes: 2
Views: 1776
Reputation: 2502
Due to the fact that returning a lot of result causes time for propagation between process, I would suggest to divide the task in few part and process it before returning back.
n = 4
def monte_carlo():
raw_result = []
for j in range(10**4 / n):
a = np.random.randint(0,10)
b = np.random.rand(10,2)
raw_result .append([a,b])
result = processResult(raw_result)
#Your method to reduce the result return,
#let's assume the return value is [avg(a),reformed_array(b)]
return result
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
pool = Pool(processes=4)
#you can control how many processes here, for example multiprocessing.cpu_count()-1 to avoid completely blocking
multiple_results = [pool.apply_async(monte_carlo, (i,)) for i in range(n)]
data = [res.get() for res in multiple_results]
#OR
data = pool.map(monte_carlo, [i for i in range(n)])
#Both return you a list of [avg(a),reformed_array(b)]
Upvotes: 2
Reputation: 182
Simple error.
a and b are created in your function They do not exist in your main scope. you will need to return them back from your function
def monte_carlo():
a = np.random.randint(0,10)
b = np.random.rand(10,2)
#create a return statement here. It may help if you put them into an array so you can return 2 value
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class
'_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
p1 = multiprocessing.Process(target = monte_carlo)
p1.start()
p1.join()
#Call your function here and save the return to something
for i in range(nMCS):
integers.append(a) # paste here
arrays.append(b) # and here
Edit: tested code and found you were never calling your monte_carlo function. a and b are now working correctly but you have a new error to try and solve. Sorry but I wont be able to help with this error as I dont understand it myself, but here is my edit of your code.
import numpy as np
import multiprocessing
nMCS = 10 ** 6
integers = []
arrays = []
def monte_carlo():
a = np.random.randint(0,10)
b = np.random.rand(10,2)
temp = [a,b]
return temp
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class
'_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
p1 = multiprocessing.Process(target = monte_carlo())#added the extra brackets here
p1.start()
p1.join()
for i in range(nMCS):
array = monte_carlo()
integers.append(array[0])
arrays.append(array[1])
and here is the error I got with this edit. I am still learning multi processing myself so other people may be better suited to help with this
Process Process-6:
Traceback (most recent call last):
File"c:\users\lunar\appdata\local\continuum\anaconda3\lib\multiprocessing\process.py", line 252, in _bootstrap
self.run()
File "c:\users\lunar\appdata\local\continuum\anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: 'list' object is not callable
Upvotes: 0