Void S
Void S

Reputation: 802

Scraping Job Title from LinkedIn

My code so far - If I search for a job title in LinkedIn - (For example-Cyber Analyst), will gather all links of this job posting/page

Goal -I put these links in a list, and iterate through them (Code works so far) to print the title of each job posting/link

My code iterates through every link, but does not get the Post title/Job title text. Which is the goal.

import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
test1=[]

options = Options()
options.headless = True
driver = webdriver.Chrome(ChromeDriverManager().install())


url = "https://www.linkedin.com/jobs/search/?currentJobId=2213597199&geoId=103644278&keywords=cyber%20analyst&location=United%20States&start=0&redirect=false"
driver.get(url)
time.sleep(2)
elements = driver.find_elements_by_class_name("result-card__full-card-link")
job_links = [e.get_attribute("href") for e in elements]

for job_link in job_links:
    test1.append(job_link) #prints all links into test1

for b in test1:
    driver.get(b)
    time.sleep(3)
    element1=driver.find_elements_by_class_name("jobs-top-card__job-title t-24")
    title=[t.get_attribute("jobs-top-card__job-title t-24") for t in element1]
    print(title)

Upvotes: 0

Views: 865

Answers (1)

coderoftheday
coderoftheday

Reputation: 2075

I couldn't see class 'obs-top-card__job-title t-24' on the link pages, but this gives you the job titles for every href

Change

element1=driver.find_elements_by_class_name("jobs-top-card__job-title t-24")
title=[t.get_attribute("jobs-top-card__job-title t-24") for t in element1]

to

element1=driver.find_elements_by_class_name("topcard__title")
title=[t.text for t in element1]


>>> ['Cyber Threat Intelligence Analyst']
>>> ['Jr. Python/Cyber Analyst (TS/SCI)']
>>> ['Cyber Security Analyst']
....ect

every time you do driver.get(b) a new page is fetched, so the html code is not the same as driver.get(url) so I think t.get_attribute("jobs-top-card__job-title t-24") belongs to html code for driver.get(url) but as I said this page is closed as driver.get(b) is fetched

Also each page for driver.get(b) has the same structure so element1=driver.find_elements_by_class_name("topcard__title") will always work

e.g. this is a one of the pages of driver.get(b):

enter image description here

This is where topcard_title is

enter image description here

Upvotes: 1

Related Questions