ninjacowgirl
ninjacowgirl

Reputation: 83

Double parallel loop with Python Joblib

Good day

I am trying to speed up a computation that involves many independent integrations. To do this I am using pythons Joblib and multiprocessing. So far I have succeeded with parallelizing the inner loop of my computation, but I would like to do the same with the outer loop. Since parallel programming messes with my mind, I am wondering if someone could help me. So far I have:

 from joblib import Parallel, delayed
import multiprocessing

N = 10 # Some number
inputs = range(1,N,2)
num_cores = multiprocessing.cpu_count()

def processInput(n):
    u_1 = lambda x,y: f(x,y)g(n,m) # Some function
    Cn = scintegrate.nquad(u_1, [[A,B],[C,D]]) # A number
    return Cn*F(x,y)*G(n,m)

resultsN = []

for m in range(1,N,2):  # How can this be parallelized? 
    add = Parallel(n_jobs=num_cores)(delayed(processInput)(n) for n in inputs)
    resultsN = add + resultsN

resultsN = sum(resultsN)

This have so far produced the correct results. Now I would like to do the same with outer loop. Does anyone have an idea how I can do this?

I am also wondering if the u_1 declaration can be done outside the processInput, and any other suggestions for improvement will be appreciated.

Thanks for any replies.

Upvotes: 3

Views: 4928

Answers (1)

Pietro Marchesi
Pietro Marchesi

Reputation: 882

If I understand correctly, you run your function processInput(n) for a range of n values, and you need to do that m times and add everything together. Here, the index m only keeps count of how many times you want to run your processing function and add the results together, but nothing else. This allows you to do everything with just one layer of parallelism, namely creating a list of inputs which already contains repeated values, and dividing that workload amongst your cores. The quick intuition is that instead of processing inputs [1,2,3,4] in parallel and then doing that a bunch of times, you run in parallel inputs [1,1,1,2,2,2,3,3,3,4,4,4]. Here is what it could look like (I've changed your functions into a simpler function that I can run).

import numpy as np
from joblib import Parallel, delayed
import multiprocessing
from math import ceil

N = 10 # Some number
inputs = range(1,N,2)
num_cores = multiprocessing.cpu_count()

def processInput(n): # toy function
    return n

resultsN = []
# your original solution with an additional loop that needs
# to be parallelized
for m in range(1,N,2):  
    add = Parallel(n_jobs=num_cores)(delayed(processInput)(n) for n in inputs)
    resultsN = add + resultsN
resultsN = sum(resultsN)
print resultsN

# solution with only one layer of parallelization
ext_inputs = np.repeat(inputs,ceil(m/2.0)).tolist()
add = Parallel(n_jobs=num_cores)(delayed(processInput)(n) for n in ext_inputs)
resultsN = sum(add)
print resultsN 

The ceil is required because in your original loop m skips every second value.

Upvotes: 3

Related Questions