Reputation: 11
(New to Python and 1st post)
See code below, but here's the issue: I'm trying to scrape the webpage in the code for all job titles on the page, but when I print the list, I'm not getting any values. I've tried using different xpaths to see if I could get something to print, but every time my list is always empty.
Does anyone know if it is an issue with my code, or if there is something about the site structure that I didn't consider?
Thanks in advance!
from lxml import html
import requests
page = requests.get("https://careers.homedepot.com/job-search-results/?location=Atlanta%2C%20GA%2C%20United%20States&latitude=33.7489954&longitude=-84.3879824&radius=15&parent_category=Corporate%2FOther")
tree = html.fromstring(page.content)
Job_Title = tree.xpath('//*[@id="widget-jobsearch-results-list"]/div/div/div/div[@class="jobTitle"]/a/text()')
print (Job_Title)
Upvotes: 0
Views: 74
Reputation: 21
Try a library that can parse JS (dryscrape is a lightweight alternative).
Here's a code sample
from lxml import html
import requests
import dryscrape
session = dryscrape.Session()
session.visit("https://careers.homedepot.com/job-search-results/?location=Atlanta%2C%20GA%2C%20United%20States&latitude=33.7489954&longitude=-84.3879824&radius=15&parent_category=Corporate%2FOther")
page = session.body()
tree = html.fromstring(page.content)
Job_Title = tree.xpath('//*[@id="widget-jobsearch-results-list"]/div/div/div/div[@class="jobTitle"]/a/text()')
print (Job_Title)
Upvotes: 1
Reputation: 52665
Information that you're looking for is generated dynamically with some JavaScript
while requests
allows to get just initial HTML
page source.
You might need to use selenium
(+chromedriver
) to get required data:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://careers.homedepot.com/job-search-results/?location=Atlanta%2C%20GA%2C%20United%20States&latitude=33.7489954&longitude=-84.3879824&radius=15&parent_category=Corporate%2FOther")
xpath = "//a[starts-with(@id, 'job-results')]"
wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, xpath)))
jobs = [job.text for job in driver.find_elements_by_xpath(xpath)]
Upvotes: 1
Reputation: 615
That page build HTML(table) with JS. In other words, Target block does not exist as HTML on that page. Please open the source and check it.
<div class="entry-content-wrapper clearfix">
<div id="widget-jobsearch-results-list"></div> # <- Target block is empty!
<div id="widget-jobsearch-results-pages"></div>
</div>
Upvotes: 0