zaki19
zaki19

Reputation: 25

web scraping data from glassdoor using selenium

please I need some help to run this code (https://github.com/PlayingNumbers/ds_salary_proj/blob/master/glassdoor_scraper.py) In order to scrape job offer data from Glassdoor
Here's the code snippet:

from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium import webdriver
import time
import pandas as pd

 options = webdriver.ChromeOptions()
    
#Uncomment the line below if you'd like to scrape without a new Chrome window every time.
#options.add_argument('headless')
    
#Change the path to where chromedriver is in your home folder.
driver = webdriver.Chrome(executable_path=path, options=options)
driver.set_window_size(1120, 1000)
    
url = "https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn&typedKeyword="+'data scientist'+"&sc.keyword="+'data scientist'+"&locT=&locId=&jobType="
#url = 'https://www.glassdoor.com/Job/jobs.htm?sc.keyword="' + keyword + '"&locT=C&locId=1147401&locKeyword=San%20Francisco,%20CA&jobType=all&fromAge=-1&minSalary=0&includeNoSalaryJobs=true&radius=100&cityId=-1&minRating=0.0&industryId=-1&sgocId=-1&seniorityType=all&companyId=-1&employerSizes=0&applicationType=0&remoteWorkType=0'
driver.get(url)

#Let the page load. Change this number based on your internet speed.
        #Or, wait until the webpage is loaded, instead of hardcoding it.
time.sleep(5)

        #Test for the "Sign Up" prompt and get rid of it.
try:
    driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
    pass
time.sleep(.1)
try:
    driver.find_element_by_css_selector('[alt="Close"]').click() #clicking to the X.
    print(' x out worked')
except NoSuchElementException:
    print(' x out failed')
    pass

        
#Going through each job in this page
job_buttons = driver.find_elements_by_class_name("jl")

I'm getting an empty list

job_buttons
[]

Upvotes: 1

Views: 1054

Answers (1)

Prophet
Prophet

Reputation: 33351

Your problem is with wrong except argument.
With driver.find_element_by_class_name("selected").click() you are trying to click non-existing element. There is no element matching "selected" class name on that page. This causes NoSuchElementException exception as you can see yourself while you are trying to catch ElementClickInterceptedException exception.
To fix this you should use the correct locator or at least the correct argument in except.
Like this:

try:
    driver.find_element_by_class_name("selected").click()
except NoSuchElementException:
    pass

Or even

try:
    driver.find_element_by_class_name("selected").click()
except:
    pass

I'm not sure what elements do you want to get into job_buttons.
The search results containing all the details per each job can be found by this:

job_buttons = driver.find_elements_by_css_selector("li.react-job-listing")

Upvotes: 1

Related Questions