Tim
Tim

Reputation: 3

Python - Loop through each page to get all records

I would like to retrieve all records (total 50,000) from an API endpoint. The endpoint only returns a maximum of 1000 records per page. Here's the function to get the records.

def get_products(token,page_number):
  url = "https://example.com/manager/nexus?page={}&limit={}".format(page_number,1000)
  header = {
    "Authorization": "Bearer {}".format(token)
  }
  response = requests.get(url, headers=header)
  product_results = response.json()

  total_list = []

  for result in product_results['Data']:
    date = result['date']
    price = result['price']
    name = result['name']
    total_list.append((date,price,name))
  columns = ['date', 'price', 'name']
  df = pd.DataFrame(total_list, columns=columns)
  results = json.dumps(total_list)
  return df, results

How can I loop through each page until the final record without hardcoding the page numbers? Currently, I'm hardcoding the page numbers as below for the first 2 pages to get 2000 records as a test.

for page_number in np.arange(1,3):
  token = get_token()

  product_df,product_json = get_products(token,page_number)
  if page_number==1:
    product_all=product_df
  else:
    product_all=pd.concat([product_all,product_df])

print(product_all)

Thank you.

Upvotes: 0

Views: 1056

Answers (2)

chanrlc
chanrlc

Reputation: 182

It depends on how your backend method: json GET's return. page and limit are required. you may rewrite the json return all data. in stead of just every 1000.

num = int(50000/1000);  
    
for i in range(1, num):
     token = get_token()
     product_df,product_json = get_products(token, i)
    if i==1:
       product_all=product_df
    else:
       product_all=pd.concat([product_all,product_df])

print(product_all)

Upvotes: 0

frogcoder
frogcoder

Reputation: 1003

I don't know about the behavior of the endpoint. Assuming when the page number is greater than the last page number, you would get an empty list instead. If that is the case, you could just check if the result is empty.

page_number = 1
token = get_token()

product_df, product_json = get_products(token,page_number)
product_all=product_df

while product_df.size:
  page_number = page_number + 1
  token = get_token()

  product_df,product_json = get_products(token,page_number)
  product_all=pd.concat([product_all,product_df])

print(product_all)

If you are sure there are 1000 records max per page, you could check if the result count is less than 1000 and stop the loop.

Upvotes: 1

Related Questions