Reputation: 185
I am doing a function optimization using an evolutionary algorithm (CMAES). To run it faster I am using the multiprocessing module. The function I need to optimize takes large matrices as inputs (input_A_Opt, and input_B_Opt)
in the code below.
They are several GBs of size. When I run the function without multiprocessing, it works well. When I use multiprocessing there seems to be a problem with memory. If I run it with small inputs it works well, but when I run with the full input, I get the following error:
File "<ipython-input-2-bdbae5b82d3c>", line 1, in <module>
opt.myFuncOptimization()
File "/home/joe/Desktop/optimization_folder/Python/Optimization.py", line 45, in myFuncOptimization
**f_values = pool.map_async(partial_function_to_optmize, solutions).get()**
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/usr/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
error: 'i' format requires -2147483648 <= number <= 2147483647
And here's a simplified version of the code (again, if I run it with the input 10 times smaller, all works fine):
import numpy as np
import cma
import multiprocessing as mp
import functools
import myFuncs
import hdf5storage
def myFuncOptimization ():
temp = hdf5storage.loadmat('/home/joe/Desktop/optimization_folder/matlab_workspace_for_optimization')
input_A_Opt = temp["input_A"]
input_B_Opt = temp["input_B"]
del temp
numCores = 20
# Inputs
#________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
P0 = np.array([ 4.66666667, 2.5, 2.66666667, 4.16666667, 0.96969697, 1.95959596, 0.44088176, 0.04040404, 6.05210421, 0.58585859, 0.46464646, 8.75751503, 0.16161616, 1.24248497, 1.61616162, 1.56312625, 5.85858586, 0.01400841, 1.0, 2.4137931, 0.38076152, 2.5, 1.99679872 ])
LBOpt = np.array([ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ])
UBOpt = np.array([ 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, ])
initialStdsOpt = np.array([2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, ])
minStdsOpt = np.array([ 0.030, 0.40, 0.030, 0.40, 0.020, 0.020, 0.020, 0.020, 0.020, 0.020, 0.020, 0.020, 0.020, 0.020, 0.020, 0.020, 0.020, 0.020, 0.050, 0.050, 0.020, 0.40, 0.020, ])
options = {'bounds':[LBOpt,UBOpt], 'CMA_stds':initialStdsOpt, 'minstd':minStdsOpt, 'popsize':numCores}
es = cma.CMAEvolutionStrategy(P0, 1, options)
pool = mp.Pool(numCores)
partial_function_to_optmize = functools.partial(myFuncs.func1, input_A=input_A_Opt, input_B=input_B_Opt)
while not es.stop():
solutions = es.ask(es.popsize)
f_values = pool.map_async(partial_function_to_optmize, solutions).get()
es.tell(solutions, f_values)
es.disp(1)
es.logger.add()
return es.result_pretty()
Any suggestions on how to solve this issue? am I not coding properly (new to python) or should I use other multiprocessing package like scoop?
Upvotes: 6
Views: 1805
Reputation: 50076
Your objects are too big to pass between processes. You're passing along more than 2147483647 bytes - that's over 2GB! The protocol isn't made for this, and the sheer overhead of serializing and deserializing such large chunks of data can be a serious performance overhead.
Reduce the size of data passed to each process. If you workflow allows it, read in the data in the separate process, and pass along only the results.
Upvotes: 2