Jon Do
Jon Do

Reputation: 23

How to read the next page on API using python iterator?

There is an API that only produces one hundred results per page. I am trying to make a while loop so that it goes through all pages and takes results from all pages, but it does not work. I would be grateful if you could help me figure it out.

    params = dict(
    order_by='salary_desc',
    text=keyword,
    area=area,
    period=30, # days
    per_page=100,
    page = 0,
    no_magic='false',  # disable magic
    search_field='name'  # available: name, description, company_name
)
response = requests.get(
    BASE_URL + '/vacancies',
    headers={'User-Agent': generate_user_agent()},
    params=params,
)
response

items = response.json()['items']
vacancies = []
for item in items:
    vacancies.append(dict(
        id=item['id'],
        name=item['name'],
        salary_from=item['salary']['from'] if item['salary'] else None,
        salary_to=item['salary']['to'] if item['salary'] else None,
        currency = item['salary']['currency'] if item['salary'] else None,
        created=item['published_at'],
        company=item['employer']['name'],
        area = item['area']['name'],
        url=item['alternate_url']
    ))

I loop through the dictionary, if there is a result in the dictionary, I add +1 to the page parameter as an iterator:

while vacancies == True:
  params['page'] += 1

Result in dictionary params ['page'] = zero remains (pages in API start at zero).

When calling params after starting the loop, the result is:

{'area': 1,
'no_magic': 'false',
'order_by': 'salary_desc',
'page': 0,
'per_page': 100,
'period': 30,
'search_field': 'name',
'text': '"python"'}

Perhaps I am doing the loop incorrectly, starting from the logic that while there is a result in the dictionary, the loop must be executed.

Upvotes: 1

Views: 834

Answers (2)

TheLazyScripter
TheLazyScripter

Reputation: 2665

while vacancies == True: # 
  params['page'] += 1

will never evaluate to literal True regardless of it's contents. Python dict's; even thought they are Truthy They aren't True. You need to lessen the strictness of the statement.

if vacancies: # is truthy if it's len > 0, falsey otherwise
    # Do something

Or you can explicitly check that it has content

if len(vacancies) > 0:
    # Do something

This solves the problem of how to evaluate based on an object but doesn't solve the overall logic problem.

for _ in vacancies:
    params["page"] += 1
    # Does something for every item in vacancies

What you do each loop will depend on the problem and will require another question!

fixed below

params = dict(
    order_by='salary_desc',
    text=keyword,
    area=area,
    period=30, # days
    per_page=100,
    page = 0,
    no_magic='false',  # disable magic
    search_field='name'  # available: name, description, company_name
)
pages = []
while True:
  params["page"] += 1
  response = requests.get(BASE_URL + '/vacancies', headers={'User-Agent': generate_user_agent()}, params=params,)
  items = response.json()['items']
  if not items:
    break
  pages.append(items) # Do it for each page

Make vacancies for each page

results = []
for page in pages:
  vacancies = []
  for item in page:
      vacancies.append(dict(
          id=item['id'],
          name=item['name'],
          salary_from=item['salary']['from'] if item['salary'] else None,
          salary_to=item['salary']['to'] if item['salary'] else None,
          currency = item['salary']['currency'] if item['salary'] else None,
          created=item['published_at'],
          company=item['employer']['name'],
          area = item['area']['name'],
          url=item['alternate_url']
      ))
  results.append(vacancies)

Results will be the fine list of all items.

Upvotes: 2

Synkied
Synkied

Reputation: 171

vacancies is never True. If you want to test on the boolean value of "vacancies" you could use bool(vacancies). But with Python, you can use

while vacancies:
  # some code logic

This way, Python will auto cast to bool your list. If your list as something inside (len(your_list) > 0), bool(your_list) evaluatues to True, else it's False.

Also, instead of using dict(), you could write your dict this way:

params = {
    'order_by': 'salary_desc',
    'text':keyword,
    'area': area,
    'period': 30, # days
    'per_page': 100,
    'page': 0,
    'no_magic': 'false',  # disable magic
    'search_field': 'name'  # available: name, description, company_name
}

which is more pythonic.

Upvotes: 0

Related Questions