nodoze
nodoze

Reputation: 127

python multiprocessing example itertools multiple lists

I have a very simple application with a nested for-loop and it can take minutes to hours to run depending on the amount of data.

I got started with the multiprocessing lib in python. I tried implementing it in is most basic form, and even though my code runs, there are no performance gains. Leading me to believe I am implementing it incorrectly and/or the design of my code is extremely flawed.

My code is pretty straight forward:

import csv
import multiprocessing

somedata1 = open('data1.csv', 'r')
SD_data = csv.reader(data1,delimiter=',')
data1 = []
**import lots of CSV data***

def crunchnumbers():
   for i, vald1 in enumerate(data1):
        for i, vald2 in enumerate(data2):
            for i, vald3 in enumerate(data3):   
                for i, vald4 in enumerate(data3):
                    for i, vald5 in enumerate(data3):
                         sol = #add values
    print d_solution

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    pool.apply(crunchnumbers)

How can I do this with python's multiprocessing? (somehow spliting into chunks?) or is this a better job for jug? Based on suggestions on SO, I spent a few days trying to use Jug, but the number of iterations in my nested for-loops easily gets into the 10's of millions (and more) of very fast transactions, so the author recommends against this.

Upvotes: 1

Views: 1100

Answers (1)

Daniel
Daniel

Reputation: 42758

I suggest to use itertools.product with multiprocessing-map:

import csv
import multiprocessing
from itertools import product

def crunchnumber(values):
    if some criteria:
        sol = values[0][2]+values[1][2]+values[2][2].... 
        return sol

def process(datas):
    "takes data1, ..., datan as a list"
    pool = multiprocessing.Pool(processes=4)
    result = pool.map_async(crunchnumber, product(*datas))
    print [a for a in result if a is not None]

Upvotes: 3

Related Questions