swilson
swilson

Reputation: 55

How do I get a subset of data from an API when I don't know the max number of pages

This is related to a question I posted previously: How do I get all data from an API when I don't know the max number of pages.

I am pulling data using an API in Python. I can filter the response so it only returns data where a receivedDate is between a specified start and end date, but I don't know how many pages meets this condition. I wrote the code below to address this. Pieces of the code works, but it does not complete, so I end it. I think it's because it's looping through every page. Can I modify this to only loop through pages that contain a receivedDate that exists between a start date and an end date without looping through every page?

import requests
import datetime
import json

start_date = datetime.datetime(2016,10,1).isoformat() #convert to ISO-8601 to match the date format of the response
end_date = datetime.datetime(2017,9,30).isoformat()
page_size = 2000
page_number = 1

while True:

    data = f'{{"receivedDateFrom":"{start_date}", "receivedDateTo":"{end_date}", "pageSize":{page_size}, "pageNumber":{page_number}}}' # I have confirmed this formating works
    response = requests.post(url, headers=headers, verify= True, data=data) #leaving out url and headers because of data privacy concerns, but I have confirmed this works
    response_data = response.json()
    
    if not response_data:
        break
        
    all_data.extend(response_data)

    page_number +=1

Upvotes: 0

Views: 42

Answers (1)

swilson
swilson

Reputation: 55

This is how I resolved it with help from discussion above:

start_date = datetime.datetime(2016,10,1).isoformat() #convert to ISO-8601 to match the date format of the response
end_date = datetime.datetime(2017,9,30).isoformat()
page_size = 2000
page_number = 1

all_data = []

while True:
    data = f'{{"receivedDateFrom":"{start_date}", "receivedDateTo":"{end_date}", "pageSize":{page_size}, "pageNumber":{page_number}}}'
    response = requests.post(url, headers=headers, verify= True, data=data)
    response_data = response.json()

    if response.json() == []:
        break
        
    all_data.extend(response_data)

    page_number +=1

Upvotes: 0

Related Questions