Reputation: 55
This is related to a question I posted previously: How do I get all data from an API when I don't know the max number of pages.
I am pulling data using an API in Python. I can filter the response so it only returns data where a receivedDate is between a specified start and end date, but I don't know how many pages meets this condition. I wrote the code below to address this. Pieces of the code works, but it does not complete, so I end it. I think it's because it's looping through every page. Can I modify this to only loop through pages that contain a receivedDate that exists between a start date and an end date without looping through every page?
import requests
import datetime
import json
start_date = datetime.datetime(2016,10,1).isoformat() #convert to ISO-8601 to match the date format of the response
end_date = datetime.datetime(2017,9,30).isoformat()
page_size = 2000
page_number = 1
while True:
data = f'{{"receivedDateFrom":"{start_date}", "receivedDateTo":"{end_date}", "pageSize":{page_size}, "pageNumber":{page_number}}}' # I have confirmed this formating works
response = requests.post(url, headers=headers, verify= True, data=data) #leaving out url and headers because of data privacy concerns, but I have confirmed this works
response_data = response.json()
if not response_data:
break
all_data.extend(response_data)
page_number +=1
Upvotes: 0
Views: 42
Reputation: 55
This is how I resolved it with help from discussion above:
start_date = datetime.datetime(2016,10,1).isoformat() #convert to ISO-8601 to match the date format of the response
end_date = datetime.datetime(2017,9,30).isoformat()
page_size = 2000
page_number = 1
all_data = []
while True:
data = f'{{"receivedDateFrom":"{start_date}", "receivedDateTo":"{end_date}", "pageSize":{page_size}, "pageNumber":{page_number}}}'
response = requests.post(url, headers=headers, verify= True, data=data)
response_data = response.json()
if response.json() == []:
break
all_data.extend(response_data)
page_number +=1
Upvotes: 0