Best use of multiprocessing Pool for nuanced loop?

Question

I am trying to figure out the best way to make use of python's multiprocessing Pool.

I have an n^2 nested for loop that makes a comparison for each bucket combination from a list of buckets.

The work that I would like to parallelize using the multiprocessing Pool is the compare() function call.

There are absolutely no shared resources in the compare function. If buckets A and B were comparing while buckets A and C were through another process, it would not matter.

I am new to parallel processing but I do understand the basic nature of the multiprocessing Pool. However, I am finding it difficult to implement anything that does what I would like it to do.

The reliance of the specific bucket pairs to pass to the function, along with their associated reader_list seems to be blocking me when I look at any examples of the Pool. I am not necessarily relying on a list that the function gets executed on each index of.

for i in range(0, len(bucket_names) - 1):
    bucket1 = bucket_names[i]

    for k in range(i+1, len(bucket_names)):
        bucket2 = bucket_names[k]

        reader_list1 = get_reader_list(bucket1)
        reader_list2 = get_reader_list(bucket2)

        compare(bucket1, bucket2, reader_list1, reader_list2)

Ze Xuan · Accepted Answer

Do you mean you need an example of how to use Pool to parallelize your function? Here's an example.

import multiprocessing as mp

# Generate your arguments as a list of tuples, using some method that fits your requirements. 
# Here is a hard-coded example
arguments = [(bucket1, bucket2), (bucket2, bucket3), (bucket1, bucket3)]

# Create pool given number of logical CPUs you have
pool = mp.Pool(mp.cpu_count())

# Assign work to pool (provide function and list of arguments)
# Results will be list of results
results = pool.starmap(compare, arguments)

Best use of multiprocessing Pool for nuanced loop?

Answers (1)

Related Questions