Reputation: 327
Background
I have some code that looks like this right now.
failed_player_ids: Set[str] = set()
for player_id in player_ids:
success = player_api.send_results(
player_id, user=user, send_health_results=True
)
if not success:
failed_player_ids.add(player_id)
This code works well but the problem is this is taking 5 seconds per call. There is a rate limit of 2000 calls per minute so i am way under the max capacity. I want to parallelize this to speed things up. This is my first time using multiprocessing
library in python and hence I am a little confused as to how i should proceed. I can describe what i want to do in words.
In my current code i am loop through list of player_id
and if api response is success I do nothing and if it failed i make note of that player id.
I am not sure how to implement paralleled version of this code. I have some idea but i am a little confused.
This is what i though of so far
from multiprocessing import Pool
num_processors_to_use = 5 # This is a number can be increased to get more speed
def send_player_result(player_id_list: List[str]) -> Optional[str]:
for player_id in player_id_list:
success = player_api.send_results(player_id, user=user, send_health_results=True)
if not success:
return player_id
# Caller
with Pool(processes=num_processors_to_use) as pool:
responses = pool.map(
func=send_player_result,
iterable=player_id_list,
)
failed_player_ids = Set(responses)
Any comments and suggestions would help.
Upvotes: 2
Views: 425
Reputation: 44013
If you are using function map
, then each item of the iterable player_id_list
will be passed as a separate task to function send_player_result
. Consequently, this function should no longer be expecting to be passed a list of player ids, but rather a single player id. And, as you know by now, if your tasks are largely I/O bound, then multithreading is a better model. You can either:
from multiprocessing.dummy import Pool
# or
from multiprocessing.pool import ThreadPool
You will probably want to greatly increase the number of threads (but not greater than the size of player_id_list
):
#from multiprocessing import Pool
from multiprocessing.dummy import Pool
from typing import Set
def send_player_result(player_id):
success = player_api.send_results(player_id, user=user, send_health_results=True)
return success
# Only required for Windows if you are doing multiprocessing:
if __name__ == '__main__':
pool_size = 5 # This is a number can be increased to get more concurrency
# Caller
failed_player_ids: Set[str] = set()
with Pool(pool_size) as pool:
results = pool.map(func=send_player_result, iterable=player_id_list)
for idx, success in enumerate(results):
if not success:
# failed for argument player_id_list[idx]:
failed_player_ids.add(player_id_list[idx])
Upvotes: 2