Reputation: 195
So, i'm trying to parallelize a function which solves pyomo instances with python 3.7, using the multiprocessing module. The code works, but the startup time is absurd (~25 seconds per process). Weird thing is, i tried the same code in another, but way less powerful computer and it went down to ~2 seconds (same code, same amount of parallel processes, same versions for everything but Python, which is 3.6 on that pc).
Using cProfile, i found out that the dump method of the pickler was the one consuming that much time, but i can't seem to understand why would it take so long. The data is small, and i checked by using sys.getsizeof() to see if any of the arguments of the parallelized function were larger that expected, but they were not.
Does anyone know what could be the cause of the slow pickle dump?
The code:
from pyomo.environ import *
from pyomo.opt import SolverFactory, TerminationCondition
from pyomo.opt.parallel import SolverManagerFactory
import sys
import multiprocessing
def worker(init_nodes[i_nodo][j_nodo], data, optsolver, queue, shared_incumbent_data):
#[pyomo instances solving and constraining]
return
def foo(model, data, optsolver, processes = multiprocessing.cpu_count()):
queue = multiprocessing.Queue()
process_dict = {}
for i_node in range(len(init_nodes)): #init_nodes is a list containing lists of pyomo instances
for j_node in range(len(init_nodes[i_node])):
process_name = str(i_node) + str(j_node)
print(" - Data size:", sys.getsizeof(data)) #same for all of the args
process_dict[process_name] = multiprocessing.Process(target=worker, args=(init_nodes[i_nodo][j_nodo], data, optsolver, queue, shared_incumbent_data))
pr = cProfile.Profile()
pr.enable()
process_dict[process_name].start()
pr.disable()
ps = pstats.Stats(pr)
ps.sort_stats('time').print_stats(5)
for n_nodo in process_dict:
process_dict[n_nodo].join(timeout=0)
#imports
#[model definition]
#[data is obtained from 3 .tab files, the biggest one has a 30 x 40 matrix, with 1 to 3 digit integers]
optsolver = SolverFactory("gurobi")
if __name__ == "__main__":
foo(model, data, optsolver, 4)
Size of arguments obtained by sys.getsizeof() and profile of the .start() on the first computer
- Data size: 56
- Init_nodes size: 72
- Queue size: 56
- Shared incumbent data size: 56
7150 function calls (7139 primitive calls) in 25.275 seconds
Ordered by: internal time
List reduced from 184 to 5 due to restriction <5>
ncalls tottime percall cumtime percall filename:lineno(function)
2 25.262 12.631 25.267 12.634 {method 'dump' of '_pickle.Pickler' objects}
1 0.004 0.004 0.004 0.004 {built-in method _winapi.CreateProcess}
1265 0.002 0.000 0.004 0.000 C:\Users\OLab\AppData\Local\Continuum\anaconda3\lib\site-packages\pyomo\core\expr\numeric_expr.py:186(__getstate__)
2 0.001 0.001 0.002 0.001 <frozen importlib._bootstrap_external>:914(get_data)
1338 0.001 0.000 0.002 0.000 C:\Users\OLab\AppData\Local\Continuum\anaconda3\lib\site-packages\pyomo\core\expr\numvalue.py:545(__getstate__)
Size of arguments obtained by sys.getsizeof() and profile of the .start() on the second computer
- Data size: 56
- Init_nodes size: 72
- Queue size: 56
- Shared incumbent data size: 56
7257 function calls (7247 primitive calls) in 1.742 seconds
Ordered by: internal time
List reduced from 184 to 5 due to restriction <5>
ncalls tottime percall cumtime percall filename:lineno(function)
2 1.722 0.861 1.730 0.865 {method 'dump' of '_pickle.Pickler' objects}
1 0.009 0.009 0.009 0.009 {built-in method _winapi.CreateProcess}
1265 0.002 0.000 0.005 0.000 C:\Users\Palbo\Anaconda2\envs\py3\lib\site-packages\pyomo\core\expr\numeric_expr.py:186(__getstate__)
1339 0.002 0.000 0.003 0.000 C:\Users\Palbo\Anaconda2\envs\py3\lib\site-packages\pyomo\core\expr\numvalue.py:545(__getstate__)
1523 0.001 0.000 0.001 0.000 {built-in method builtins.hasattr}
Cheers!
The specs of the first computer that should be way faster but isn't:
Second computer specs:
Upvotes: 4
Views: 1660
Reputation: 195
Finally found a solution by dumping the pickling of the arguments of the function into a file, then passing the name of the file as an argument for the worker() function, then opening each file from within the function in each parallel process.
Dump time went down from ~24[s] to ~0.005[s]!
def worker(pickled_file_name, queue, shared_incumbent):
with open(pickled_file_name, "rb") as f:
data_tuple = pickle.load(f, encoding='bytes')
instance, data, optsolver, int_var_list, process_name, relaxed_incumbent = data_tuple
return
def foo():
[...]
picklefile = open("pickled_vars"+str(i_nodo)+str(j_nodo)+".p", "wb")
picklefile.write(pickle.dumps(variables_,-1))
picklefile.close()
process_dict[process_name] = multiprocessing.Process(target=bnbparallelbranching, args=("pickled_vars"+str(i_nodo)+str(j_nodo)+".p", q, shared_incumbent_data))
process_dict[process_name].start()
Upvotes: 3