Reputation: 1199
I have been using Python requests to get data from an API, but I want to speed it up by running asynchronously with requests_futures. I am only allowed 200 API requests per minute, so I have to check for this and wait a specified number of seconds before retrying. This number is returned in the Retry-After
header. Here is the original working code:
session = requests.Session()
for id in ticketIds:
url = 'https://colorfront.zendesk.com/api/v2/tickets/' + str(id) + '/comments.json'
req = requests.get(url, auth=zd_secret)
if req.status_code == 429:
time.sleep(int(req.headers['Retry-After']))
req = requests.get(url, auth=zd_secret)
comments += req.json()['comments']
The following asynchronous code works until it hits a rate limit, then all the requests after that fail.
session = FuturesSession()
futures = {}
for id in ticketIds:
url = 'https://colorfront.zendesk.com/api/v2/tickets/' + str(id) + '/comments.json'
futures[id] = session.get(url, auth=zd_secret)
for id in ticketIds:
comments += futures[id].result().json()['comments']
When I hit the rate limit, I need a way to retry only the requests which failed. Does requests_futures have some built-in way to handle this?
Update: The requests_futures library does not have anything built-in for this. I found this related open issue: https://github.com/ross/requests-futures/issues/26. I'll try to pace the requests up front since I know the API limit, but that won't help if another user from my organization is simultaneously hitting the same API.
Upvotes: 3
Views: 3309
Reputation: 1199
I think I have found a solution. I don't know if it's the best way, but it avoids another dependency. I can play with max_workers
and x
simultaneous requests to optimize efficiency depending on the internet speed at this coffee shop.
session = FuturesSession(max_workers=2)
futures = {}
res = {}
delay = 0
x = 200
while ticketIds:
time.sleep(delay)
if len(ticketIds) > x - 1:
for id in ticketIds[:x]:
url = 'https://colorfront.zendesk.com/api/v2/tickets/' + str(id) + '/comments.json'
futures[id] = session.get(url, auth=zd_secret)
else:
for id in ticketIds:
url = 'https://colorfront.zendesk.com/api/v2/tickets/' + str(id) + '/comments.json'
futures[id] = session.get(url, auth=zd_secret)
# use a copy of the list
for id in ticketIds[:]:
if id in futures:
res[id] = futures[id].result()
# remove successful IDs from list
if res[id].status_code == 200:
ticketIds.remove(id)
comments += res[id].json()['comments']
else:
delay = int(res[id].headers['Retry-After'])
Upvotes: 1
Reputation: 2259
You should be able to use the Retry module from urllib3.util.retry
to achieve this:
from requests_futures.sessions import FuturesSession
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = FuturesSession()
retries = 5
status_forcelist = [429]
retry = Retry(
total=retries,
read=retries,
connect=retries,
respect_retry_after_header=True,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
futures = {}
for id in ticketIds:
url = 'https://colorfront.zendesk.com/api/v2/tickets/' + str(id) + '/comments.json'
futures[id] = session.get(url, auth=zd_secret)
for id in ticketIds:
comments += futures[id].result().json()['comments']
Upvotes: 3