Coldchain9
Coldchain9

Reputation: 1745

Rate Limiting API Requests in Python with Multiprocessing

I am using multiprocessing in Python to make parallel API requests. I have 8 cores on my machine mp.cpu_count() == 8 .

I am limited to roughly 6 requests per second. What would be the optimal way to make my API calls and process them?

Example code idea below but it doesn't work as intended. I get rapid-fire amount of 429s and then it backs off 10 seconds, but continues to get 429s again in rapid successsion. My fear is that my computer is sending all 8 cores so fast that it is overwhelming the service and not allowing for any successful calls to come back.

import multiprocessing as mp
import time

def api_call(iter):

    query = {'api_key': iter[0], 'user_id': iter[1]}
    resp = requests.get(url, params=query)
    if resp.status_code == 200:

        data = resp.json()
        print(data )
        return data
    else:
        # Handle too many requests
        while resp.status_code == 429:
            time.sleep(10) # Back off 10 seconds.
            resp = requests.get(url, params=query)
        else:
            if resp.status_code == 200:
                data = resp.json()
                return data

   
if __name__ == "__main__":

    # Assume an iterable with api_key and other data to make request to API and populate query string
    iterable: list = [(api_key, other_data1), (api_key, other_data2)]

    with mp.Pool(mp.cpu_count()) as p:
        try:
            res: list = list(p.map(api_call, iterable))
        except KeyboardInterrupt:
            print("Terminating Multiprocess due to Keyboard Interrupt")
            p.terminate()
        else:
            p.close()
            p.join()

Upvotes: 2

Views: 1535

Answers (1)

larsks
larsks

Reputation: 311288

It sounds like you may already solved your problem, but one solution worth considering is the use of a Semaphore to limit the number of active processes. This has the advantage that you can actually start as many tasks in parallel as you want, and then limit only the critical section that makes the web requests.

For example:

import multiprocessing
import requests

mgr = multiprocessing.Manager()
sem = mgr.Semaphore(4)


def task(id):
    print(f"start task {id}")
    with sem:
        res = requests.get("http://google.com")
        date_from_header = res.headers["date"]
    print(f"stop task {id}")
    return date_from_header


with multiprocessing.Pool(processes=10) as pool:
    res = pool.map(task, range(1, 20))

print(res)

Regardless of the size of your pool, this will only ever have four concurrent calls to requests.get at any given time. Once the requests is complete, your tasks can execute other code in parallel.

Upvotes: 4

Related Questions