aaacccbbb
aaacccbbb

Reputation: 11

Driver.current_url not reflecting the opening of a link- selenium

After using Selenium to log in to LinkedIn, I'm trying to navigate to the jobs page using the following:


jobs = driver.find_element(
    By.XPATH, '//*[@id="global-nav"]/div/nav/ul/li[3]/a')

jobs.click()

job_src = driver.page_source
print(driver.current_url)

The above returns: https://www.linkedin.com/feed/ However, looking at the browser that selenium opens up, it looks as though https://www.linkedin.com/jobs/? is clicked on.

Is my XPATH wrong? I copied it from Chrome Dev tools.

From there, I'm trying to scrape the job titles using :

soup = BeautifulSoup(job_src, 'html.parser')
job_list_html = soup.select('.job-card-list__title')


for job in job_list_html:
    print(job.get_text())

But all that's returned is an empty list.

Upvotes: 1

Views: 63

Answers (1)

DevLan
DevLan

Reputation: 56

The issue you are running into is that you need to wait until the page has loaded. Here is my suggestion.

First after you log in you can navigate directly to the job list URL. This is likely to be less fragile than using the XPath:

driver.get('https://www.linkedin.com/jobs/collections/recommended/')

The following is the most important piece you are missing:

wait.until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'job-card-list__title')))

You can look into other wait commands here, but the above appeared to work for me.

Next, I noticed that this only lists a few jobs since the rest are dynamically loaded when you scroll. What I did is simulated the scrolling with:

driver.execute_script('res = document.querySelector("#main > div > section.scaffold-layout__list > div"); res.scrollTo(0, res.scrollHeight)') 
time.sleep(2)

The sleep of 2 seconds is needed again to give time for that to execute before you get the source.

With that wait and the scroll, your code for getting the list of job names works.

job_src = driver.page_source
soup = BeautifulSoup(job_src, 'html.parser')
job_list_html = soup.select('.job-card-list__title')
print(len(job_list_html))
for job in job_list_html:
    print(job.get_text())

You may notice that the job list is paginated, so this code will only get the first page of jobs, but hopefully this gets you on the right track.

Upvotes: 1

Related Questions