Reputation: 127
I have a very simple application with a nested for-loop and it can take minutes to hours to run depending on the amount of data.
I got started with the multiprocessing lib in python. I tried implementing it in is most basic form, and even though my code runs, there are no performance gains. Leading me to believe I am implementing it incorrectly and/or the design of my code is extremely flawed.
My code is pretty straight forward:
import csv
import multiprocessing
somedata1 = open('data1.csv', 'r')
SD_data = csv.reader(data1,delimiter=',')
data1 = []
**import lots of CSV data***
def crunchnumbers():
for i, vald1 in enumerate(data1):
for i, vald2 in enumerate(data2):
for i, vald3 in enumerate(data3):
for i, vald4 in enumerate(data3):
for i, vald5 in enumerate(data3):
sol = #add values
print d_solution
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
pool.apply(crunchnumbers)
How can I do this with python's multiprocessing? (somehow spliting into chunks?) or is this a better job for jug? Based on suggestions on SO, I spent a few days trying to use Jug, but the number of iterations in my nested for-loops easily gets into the 10's of millions (and more) of very fast transactions, so the author recommends against this.
Upvotes: 1
Views: 1100
Reputation: 42758
I suggest to use itertools.product
with multiprocessing-map:
import csv
import multiprocessing
from itertools import product
def crunchnumber(values):
if some criteria:
sol = values[0][2]+values[1][2]+values[2][2]....
return sol
def process(datas):
"takes data1, ..., datan as a list"
pool = multiprocessing.Pool(processes=4)
result = pool.map_async(crunchnumber, product(*datas))
print [a for a in result if a is not None]
Upvotes: 3