Bangbangbang
Bangbangbang

Reputation: 560

How to extract all href from a class in Python Selenium?

I am trying to extract people's href from the URL https://www.dx3canada.com/agenda/speakers.

I tried:

elems = driver.find_elements_by_css_selector('.display-flex card vancouver')
href_output = []
for ele in elems:
    href_output.append(ele.get_attribute("href"))
print(href_output)

But the output list returns nothing...

The expected href shown as the image below and I hope the outputs as a list of hrefs: enter image description here

I really appreciate the help!

Upvotes: 2

Views: 1101

Answers (3)

undetected Selenium
undetected Selenium

Reputation: 193308

To extract the people's href attribute from the URL https://www.dx3canada.com/agenda/speakers as the the desired elements are within an <iframe> so you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.
  • Induce WebDriverWait for the visibility of all elements located.
  • You can use the following Locator Strategies:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.dx3canada.com/agenda/speakers')
    WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#whovaIframeSpeaker")))
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.display-flex.card.vancouver")))])
    
  • Console Output:

    ['https://whova.com/embedded/speaker_detail/dcrma_202003/9942778/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907682/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907688/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907676/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907696/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907690/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907670/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907693/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942779/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9908087/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907671/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907681/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907673/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907678/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907689/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907674/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907684/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907685/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907686/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942780/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907695/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907687/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907683/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907692/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907672/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907697/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907680/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907679/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907675/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907677/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907694/']
    

Here you can find a relevant discussion on Ways to deal with #document under iframe

Upvotes: 5

CEH
CEH

Reputation: 5909

Your images are in an iframe, so you will need to switch to this before you can scrape the href attributes using frame_to_be_available_and_switch_to_it.

Then, to get the list of all href attributes, you may need to run some Javascript to scroll the image into view, and handle the case where the images may be lazy loading the href:

# first, switch to iframe
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='whovaIframeSpeaker']")))

elements_list = driver.find_elements_by_xpath("//div[contains(@class, 'template-section-body')]/a[contains(@class, 'display-flex card vancouver')]")

for element in elements_list:
    driver.execute_script("arguments[0].scrollIntoView(true);", element)
    print(element.get_attribute("href"))

The results of this code:

enter image description here

Upvotes: 3

RKelley
RKelley

Reputation: 1119

For your css selector use .display-flex.card.vancouver instead.

elems = driver.find_elements_by_css_selector('.display-flex.card.vancouver')

Each word is a class, so you need to place a dot in the front of each one.

Upvotes: 0

Related Questions