Reputation: 171
I have the below matching()
function with for loop to which I am passing a big generator(unique_combinations)
.
It takes days to process so I wanted to use multiprocessing for elements in the loop to speed things up but I just can't figure out how to do it.
I find it difficult to understand the logic behind concurrent.futures
in general.
results = []
match_score = []
def matching():
for pair in unique_combinations:
if fuzz.ratio(pair[0], pair[1]) > 90:
results.append(pair)
match_score.append(fuzz.ratio(pair[0], pair[1]))
def main():
executor = ProcessPoolExecutor(max_workers=3)
task1 = executor.submit(matching)
task2 = executor.submit(matching)
task3 = executor.submit(matching)
if __name__ == '__main__':
main()
print(results)
print(match_score)
I am assuming this should speed up the execution.
Upvotes: 1
Views: 140
Reputation: 3721
If you're already using concurrent.futures, the nicest way, IMO, is to use map:
import concurrent.futures
def matching(pair):
fuzz_ratio = fuzz.ratio(pair[0], pair[1]) # only calculate this once
if fuzz_ratio > 90:
return pair, fuzz_ratio
else:
return None
def main():
unique_combinations = [(1, 2), (2, 3), (3, 4)]
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
for result in executor.map(matching, unique_combinations, chunksize=100):
if result:
# handle the results somehow
results.append(result[0])
match_score.append(results[1])
if __name__ == '__main__':
main()
There are lots of ways to handle the results but the gist is that you return a value from matching
and then retrieve it in the executor.map
for loop in main
. Docs here.
Upvotes: 1