Reputation: 157
I'm attempting to scrape all the links contained in the boxes of this website. However, my mode doesn't return anything. What am I doing wrong? If I generally look for 'a' with href=True I don't get the links I'm looking for.
import requests
from bs4 import BeautifulSoup
url = 'https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&page=1&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
ahrefs = soup.find_all('a', {'class': "article-link" , 'href': True})
for a in ahrefs:
print(a.text)
Upvotes: 1
Views: 280
Reputation: 45382
This is an angular websites which loads its content dynamically from an external Json API. The api is located here : https://www.nationalevacaturebank.nl/vacature/zoeken.json and needs a cookie to be set. The following will format the links you wanted to extract :
import requests
r = requests.get(
'https://www.nationalevacaturebank.nl/vacature/zoeken.json',
params = {
'query': '',
'location': '',
'distance': 'city',
'page': '1,110',
'limit': 100,
'sort': 'date',
'filters[careerLevel][]': 'Starter',
'filters[educationLevel][]': 'MBO'
},
headers = {
'Cookie': 'policy=accepted'
}
)
links = [
"/vacature/{}/reisspecialist".format(t["id"])
for t in r.json()['result']['jobs']
]
print(links)
The Json result also gives you all card metadata embedded in this page
Upvotes: 2