Reputation: 111

Why is BeautifulSoup's findAll returning an empty list when I search by class?

I am trying to web-scrape using an h2 tag, but BeautifulSoup returns an empty list.

<h2 class="iCIMS_InfoMsg iCIMS_InfoField_Job">

html=urlopen("https://careersus-endologix.icims.com/jobs/2034/associate-supplier-quality-engineer/job")
bs0bj=BeautifulSoup(html,"lxml")
nameList=bs0bj.findAll("h2",{"class":"iCIMS_InfoMsg iCIMS_InfoField_Job"})
print(nameList)

Upvotes: 2

Answers (2)

QHarr

Reputation: 84465

The content is inside an iframe and updated via js (so not present in initial request). You can use the same link the page is using to obtain iframe content (the iframe src). Then extract the string from the script tag that has the info and load with json, extract the description (which is html) and pass back to bs to then select the h2 tags. You now have the rest of the info stored in the second soup object as well if required.

import requests
from bs4 import BeautifulSoup as bs
import json

r = requests.get('https://careersus-endologix.icims.com/jobs/2034/associate-supplier-quality-engineer/job?mobile=false&width=1140&height=500&bga=true&needsRedirect=false&jan1offset=0&jun1offset=60&in_iframe=1')
soup = bs(r.content, 'lxml')
script = soup.select_one('[type="application/ld+json"]').text
data = json.loads(script)
soup = bs(data['description'], 'lxml')
headers = [item.text for item in soup.select('h2')]
print(headers)

Upvotes: 1

Michele Rava

Reputation: 304

The answer lays hidden in two elements:

javascript rendered contents: after document.onload
in particular the content managed by js comes after this comment and it's, indeed, rendered by js. The line where the block starts is: "< ! - -BEGIN ICIMS - - >" (space added to avoid it goes blank)

As you can imagine the h2 class="ICISM class here" DOESN'T exist WHEN you call the bs4 methods.

The solution? IMHO the best way to achieve what you want is to use selenium, to get a full rendered web page.

check this also Web-scraping JavaScript page with Python

Upvotes: 0

Why is BeautifulSoup&#39;s findAll returning an empty list when I search by class?

Answers (2)

Related Questions

Why is BeautifulSoup's findAll returning an empty list when I search by class?