Coolio2654
Coolio2654

Reputation: 1739

Keeping track of index of parallelized for loop in Joblib

In Python, I have a list of objects over which I need to iterate in a loop, and output a result for each iteration, while also keeping track of the index of the object being iterated over.

Normally, this isn't a problem, since I can just use enumerate and do

results = []

for index, value in enumerate(list_of_objects):
    ... *calculations* ...

    results.append([index, result_of_calculations])

However, lately my computations were taking too long, and so I started using joblib to parallelize my loops. However, now I cannot keep track of the index of operation with enumerate, because each piece of the loop can begin, and end, at irregular times, and I am stumped.

How could I get code like the following to work, where each first value of the sub-array refers to the index of the object that was used for that particular iteration?

from joblib import Parallel, delayed

def single_loop_function(x):
    single_output = *some calculations based on x*
    return single_output

all_output = Parallel(n_jobs=-1, verbose=3, backend="loky")(
    map(delayed(single_loop_function), list_of_objects))

print(all_output)
[[0, *result*], [1, *result*], ... [5, *result*], [3, *result*]] 

Upvotes: 1

Views: 1277

Answers (1)

Coolio2654
Coolio2654

Reputation: 1739

Even if joblib does not necessarily explicitly support this feature, I found out a better (more Pythonic) way of doing this (wwii's comment on this question): convert the list_of_objects to a list of sub-lists like this,

new_list = [[i, value] for i, value in enumerate(list_of_objects)]

and feed new_list instead into the joblib function, where each object's index will be explicitly attached.

Upvotes: 1

Related Questions