Reputation: 420
Sample JSON response
{
"Data": {
"City": [
{
"loc": "Sector XYZ",
"Country": "AUS",
},
{
.
.
.
.
.
},
]
},
"Meta": {},
"ResourceType": 40,
"StatusCode": 200,
"Message": null,
"Cursor": "apicursor-ad39609e-5fb2-4a66-9402-6def95e75655",
]
}
The cursor is dynamic and will change after each paginated response; the next one might be "apicursor-53ee8993-022c-41df-8be7-9bdedfd91e52
" and so on..
The New URL will have the below format
https://myurl123.com/api/V2/data/{}?size=10&cursor=apicursor-53ee8993-022c-41df-8be7-9bdedfd91e52
I cannot determine how to paginate responses and append it to a dataframe for very large datasets. Here is what I tried but this does not include pagination.
def foo(name):
url = "https://myurl123.com/api/V2/data/{}?size=10".format(name)
print(url)
headers = {
'Authorization': 'ApiKey xyz123',
'Content-Type': 'application/json'
}
response = requests.request("GET", url, headers=headers, data=payload)
try:
x = response.json()
xs = next(iter(x['Data'].values()))
df = pd.read_json(StringIO(json.dumps(xs)), orient='records')
df.reset_index(drop=True, inplace=True)
return df
except:
print('fetch failed')
I just want to paginate API and get all data inside a df
, and return that as a part of the function above.
I could not understand some of the other answers available here, so I'd like to apologize for any duplication. Thanks for your help and suggestions.
Upvotes: 0
Views: 3604
Reputation: 12548
Did I understand correctly that you need to read the API again and again until you don't get back any data anymore? You could do it like this. The function get_data()
would just return all the lines of all the requests as one Iterator. From the calling function, that would just look like one long list.
But that will take a long time for 100,000 lines. Because it will read 10 lines per request, so that is 10,000 requests one after the other.
def get_data(name):
csr = ""
baseurl = "https://myurl123.com/api/V2/data/{}".format(name)
headers = {
'Authorization': 'ApiKey xyz123',
'Content-Type': 'application/json'
}
while True:
url = "{}?size=10&cursor={}".format(baseurl, csr)
res = requests.request("GET", url, headers=headers, data=payload)
res.raise_for_status()
data = res.json()
if not data["Data"]:
break
crs = data["Cursor"]
for row in data["Data"]["City"]:
yield row
def get_df(name):
data = get_data(name)
df = pd.read_json(StringIO(json.dumps(list(data))), orient='records')
df.reset_index(drop=True, inplace=True)
return df
Upvotes: 4