Using multiprocessing in python to return values

Question

Background

I have some code that looks like this right now.

failed_player_ids: Set[str] = set()
for player_id in player_ids:
    success = player_api.send_results(
        player_id, user=user, send_health_results=True
    )
    if not success:
        failed_player_ids.add(player_id)

This code works well but the problem is this is taking 5 seconds per call. There is a rate limit of 2000 calls per minute so i am way under the max capacity. I want to parallelize this to speed things up. This is my first time using multiprocessing library in python and hence I am a little confused as to how i should proceed. I can describe what i want to do in words.

In my current code i am loop through list of player_id and if api response is success I do nothing and if it failed i make note of that player id.

I am not sure how to implement paralleled version of this code. I have some idea but i am a little confused.

This is what i though of so far

from multiprocessing import Pool


    
    num_processors_to_use = 5 # This is a number can be increased to get more speed
    
    def send_player_result(player_id_list: List[str]) -> Optional[str]:
        for player_id in player_id_list:
            success = player_api.send_results(player_id, user=user, send_health_results=True)
            if not success:
                return player_id
    # Caller
    with Pool(processes=num_processors_to_use) as pool:
            responses = pool.map(
                func=send_player_result,
                iterable=player_id_list,
            )
            failed_player_ids = Set(responses)

Any comments and suggestions would help.

Booboo · Accepted Answer

If you are using function map, then each item of the iterable player_id_list will be passed as a separate task to function send_player_result. Consequently, this function should no longer be expecting to be passed a list of player ids, but rather a single player id. And, as you know by now, if your tasks are largely I/O bound, then multithreading is a better model. You can either:

from multiprocessing.dummy import Pool
# or
from multiprocessing.pool import ThreadPool

You will probably want to greatly increase the number of threads (but not greater than the size of player_id_list):

#from multiprocessing import Pool
from multiprocessing.dummy import Pool
from typing import Set

def send_player_result(player_id):
    success = player_api.send_results(player_id, user=user, send_health_results=True)
    return success

# Only required for Windows if you are doing multiprocessing:
if __name__ == '__main__':
    
    pool_size = 5 # This is a number can be increased to get more concurrency
    
    # Caller
    failed_player_ids: Set[str] = set()
    with Pool(pool_size) as pool:
        results = pool.map(func=send_player_result, iterable=player_id_list)
        for idx, success in enumerate(results):
            if not success:
                # failed for argument player_id_list[idx]:
                failed_player_ids.add(player_id_list[idx])

Using multiprocessing in python to return values

Answers (1)

Related Questions