Reputation: 81
I am trying to figure out the best way to make use of python's multiprocessing Pool
.
I have an n^2 nested for loop that makes a comparison for each bucket combination from a list of buckets.
The work that I would like to parallelize using the multiprocessing Pool
is the compare()
function call.
There are absolutely no shared resources in the compare
function. If buckets A and B were comparing while buckets A and C were through another process, it would not matter.
I am new to parallel processing but I do understand the basic nature of the multiprocessing Pool
. However, I am finding it difficult to implement anything that does what I would like it to do.
The reliance of the specific bucket pairs to pass to the function, along with their associated reader_list
seems to be blocking me when I look at any examples of the Pool
. I am not necessarily relying on a list that the function gets executed on each index of.
for i in range(0, len(bucket_names) - 1):
bucket1 = bucket_names[i]
for k in range(i+1, len(bucket_names)):
bucket2 = bucket_names[k]
reader_list1 = get_reader_list(bucket1)
reader_list2 = get_reader_list(bucket2)
compare(bucket1, bucket2, reader_list1, reader_list2)
Upvotes: 0
Views: 56
Reputation: 56
Do you mean you need an example of how to use Pool to parallelize your function? Here's an example.
import multiprocessing as mp
# Generate your arguments as a list of tuples, using some method that fits your requirements.
# Here is a hard-coded example
arguments = [(bucket1, bucket2), (bucket2, bucket3), (bucket1, bucket3)]
# Create pool given number of logical CPUs you have
pool = mp.Pool(mp.cpu_count())
# Assign work to pool (provide function and list of arguments)
# Results will be list of results
results = pool.starmap(compare, arguments)
Upvotes: 1