Reputation: 111
I am trying to web-scrape using an h2 tag, but BeautifulSoup returns an empty list.
<h2 class="iCIMS_InfoMsg iCIMS_InfoField_Job">
html=urlopen("https://careersus-endologix.icims.com/jobs/2034/associate-supplier-quality-engineer/job")
bs0bj=BeautifulSoup(html,"lxml")
nameList=bs0bj.findAll("h2",{"class":"iCIMS_InfoMsg iCIMS_InfoField_Job"})
print(nameList)
Upvotes: 2
Views: 1400
Reputation: 84465
The content is inside an iframe and updated via js (so not present in initial request). You can use the same link the page is using to obtain iframe content (the iframe src
). Then extract the string from the script tag that has the info and load with json
, extract the description
(which is html) and pass back to bs to then select the h2
tags. You now have the rest of the info stored in the second soup object as well if required.
import requests
from bs4 import BeautifulSoup as bs
import json
r = requests.get('https://careersus-endologix.icims.com/jobs/2034/associate-supplier-quality-engineer/job?mobile=false&width=1140&height=500&bga=true&needsRedirect=false&jan1offset=0&jun1offset=60&in_iframe=1')
soup = bs(r.content, 'lxml')
script = soup.select_one('[type="application/ld+json"]').text
data = json.loads(script)
soup = bs(data['description'], 'lxml')
headers = [item.text for item in soup.select('h2')]
print(headers)
Upvotes: 1
Reputation: 304
The answer lays hidden in two elements:
As you can imagine the h2 class="ICISM class here" DOESN'T exist WHEN you call the bs4 methods.
The solution? IMHO the best way to achieve what you want is to use selenium, to get a full rendered web page.
check this also Web-scraping JavaScript page with Python
Upvotes: 0