Print elements using dt class name selenium python

Question

I am trying to write a simple scraper for Sales Navigator in Linkedin and this is the link I am trying to scrape . It has search results for specific filter options selected for account results.

The goal I am trying to achieve is to retrieve every company name among the search results. Upon inspecting the link elements carrying the company name (eg : Facile.it, AGT international), I see the following js script, showing the dt class name

    
      Facile.it

I basically want to retrieve those names and open the url represented in href.

It can be noted that all the company name links had the same dt class result-lockup__name. The following script is an attempt to collect the list of all company names displayed in the search result along with its elements.

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from bs4 import BeautifulSoup
    import re
    import pandas as pd
    import os

    def scrape_accounts(url):

        url = "https://www.linkedin.com/sales/search/companycompanySize=E&geoIncluded=emea%3A0%2Ceurope%3A0&industryIncluded=6&keywords=AI&page=1&searchSessionId=zreYu57eQo%2BSZiFskdWJqg%3D%3D"
        driver = webdriver.PhantomJS(executable_path='C:\phantomjs\bin\phantomjs.exe')
        #driver = webdriver.Firefox()
        #driver.implicitly_wait(30)
        driver.get(url)

        search_results = []
        search_results = driver.find_elements_by_class_name("result-lockup__name")
        print(search_results)

    if __name__ == "__main__":

        scrape_accounts("lol")

however, the result prints an empty list. I am trying to learn scraping different parts of web page and different elements,and thus I am not sure if I got this correct. What would be the right way?

JimmyA · Accepted Answer

I'm afraid I can't get to the page that you're after, but I notice that you're importing beautiful soup but not using it.

Try:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import re
import pandas as pd
import os

url = "https://www.linkedin.com/sales/search/companycompanySize=E&geoIncluded=emea%3A0%2Ceurope%3A0&industryIncluded=6&keywords=AI&page=1&searchSessionId=zreYu57eQo%2BSZiFskdWJqg%3D%3D"

def scrape_accounts(url = url):

    driver = webdriver.PhantomJS(executable_path='C:\phantomjs\bin\phantomjs.exe')
    #driver = webdriver.Firefox()
    #driver.implicitly_wait(30)
    driver.get(url)

    html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')

    soup = BeautifulSoup(html, 'html.parser')
    search_results = soup.select('dt.result-lockup__name a')
    for link in search_results:
        print(link.text.strip(), link['href'])

Print elements using dt class name selenium python

Answers (1)

Related Questions