Mason C.
Mason C.

Reputation: 11

Python HTMl Scrape Not Yielding A Result

(New to Python and 1st post)

See code below, but here's the issue: I'm trying to scrape the webpage in the code for all job titles on the page, but when I print the list, I'm not getting any values. I've tried using different xpaths to see if I could get something to print, but every time my list is always empty.

Does anyone know if it is an issue with my code, or if there is something about the site structure that I didn't consider?

Thanks in advance!

from lxml import html
import requests

page = requests.get("https://careers.homedepot.com/job-search-results/?location=Atlanta%2C%20GA%2C%20United%20States&latitude=33.7489954&longitude=-84.3879824&radius=15&parent_category=Corporate%2FOther")
tree = html.fromstring(page.content)

Job_Title = tree.xpath('//*[@id="widget-jobsearch-results-list"]/div/div/div/div[@class="jobTitle"]/a/text()')

print (Job_Title)

Upvotes: 0

Views: 74

Answers (3)

clueless
clueless

Reputation: 21

Try a library that can parse JS (dryscrape is a lightweight alternative).

Here's a code sample

from lxml import html
import requests
import dryscrape

session = dryscrape.Session()
session.visit("https://careers.homedepot.com/job-search-results/?location=Atlanta%2C%20GA%2C%20United%20States&latitude=33.7489954&longitude=-84.3879824&radius=15&parent_category=Corporate%2FOther")
page = session.body()
tree = html.fromstring(page.content)

Job_Title = tree.xpath('//*[@id="widget-jobsearch-results-list"]/div/div/div/div[@class="jobTitle"]/a/text()')

print (Job_Title)

Upvotes: 1

Andersson
Andersson

Reputation: 52665

Information that you're looking for is generated dynamically with some JavaScript while requests allows to get just initial HTML page source.

You might need to use selenium(+chromedriver) to get required data:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://careers.homedepot.com/job-search-results/?location=Atlanta%2C%20GA%2C%20United%20States&latitude=33.7489954&longitude=-84.3879824&radius=15&parent_category=Corporate%2FOther")
xpath = "//a[starts-with(@id, 'job-results')]"
wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, xpath)))
jobs = [job.text for job in driver.find_elements_by_xpath(xpath)]

Upvotes: 1

tell k
tell k

Reputation: 615

That page build HTML(table) with JS. In other words, Target block does not exist as HTML on that page. Please open the source and check it.

<div class="entry-content-wrapper clearfix">
    <div id="widget-jobsearch-results-list"></div> # <- Target block is empty!
    <div id="widget-jobsearch-results-pages"></div>
</div>

Upvotes: 0

Related Questions