Reputation: 1088
I have made an API request as follow:
response = requests.get('{}/customers'.format(API_URL), headers=headers, params = {"sort_by":'id', "min_id": 0})
data1 = response.json()
each request returns 1000 entries as a list, hence data1 is a list of 1000 elements. The total data I am trying to pull is 100,000 rows. Meaning I should play with the min_id
to pull those. The way I was doing this was as follow:
step one is:
response = requests.get('{}/customers'.format(API_URL), headers=headers, params = {"sort_by":'id', "min_id": 0})
data1 = response.json()
this gives:
[{'id': 6,
'a': 'x',
'b': 'y',
'c': 'z'},...,
{'id': 9994,
'a': 'm',
'b': 'n',
'c': 'o'}]
In the above output you can see the fist and the last element, so now I see that the last id is 9994, and now I make the second request as:
response = requests.get('{}/customers'.format(API_URL), headers=headers, params = {"sort_by":'id', "min_id": 9995})
data2 = response.json()
where now the min_id starts from 9995. I can make 100s of this requests and extract the whole data, but this is not certainly efficient. I wonder if I can make an iterative function to do this. Such that on each iteration the min_id is being replaced by the id
value of the last pull plus 1?
Upvotes: 2
Views: 246
Reputation: 195573
You can use max()
function to find maximum id and then update the parameter accordingly. Pseudocode:
min_id = 0
all_data = []
for r in range(100):
print('Request no. {}..'.format(r))
response = requests.get(
"{}/customers".format(API_URL),
headers=headers,
params={"sort_by": "id", "min_id": min_id}, # <-- use min_id here
)
data1 = response.json()
# find maximum id in data1 and update min_id
min_id = max(data1, key=lambda k: k['id']) + 1
# store data1 to all_data
all_data.extend(data1)
Upvotes: 2