Reputation: 4623
I want to pars vacancies. And my goal is to pars vacancies just one company
import requests
from tqdm import tqdm_notebook
import pandas as pd
r = requests.get('https://api.hh.ru/vacancies?employer_id=80').json()
r
If I do so I get by default only 20 vacancies (0 page) though there are 488
'found': 488
and
'page': 0,
'pages': 25,
'per_page': 20
I can make loop
vac = []
for i in tqdm_notebook(range(0, 25)):
vac.append(requests.get("https://api.hh.ru/vacancies?employer_id=80", params={'page': i}).json())
But I get just 25 vacancies (one for every page). Or I can do
vac = []
for j in tqdm_notebook(range(0, 20)):
for i in tqdm_notebook(range(0, 500)):
vac.append(requests.get("https://api.hh.ru/vacancies?employer_id=80", params={'page': i, 'per_page': j}).json())
But this is a very expensive way, we repeat a lot of actions. How to fix it?
Upvotes: 1
Views: 1249
Reputation: 5165
You will need to manually set the page and per_page parameters, per the API's documentation. However, you don't need a loop for the per_page parameter - it should be a static number (20):
vac = []
for i in tqdm_notebook(range(0, 25)):
vac.append(requests.get("https://api.hh.ru/vacancies?employer_id=80", params={'page': i, 'per_page':20}).json())
Also, consider making the range of pages to iterate dynamic based on the first page of pagination results.
Upvotes: 1