Reputation: 560
I am trying to extract people's href from the URL https://www.dx3canada.com/agenda/speakers.
I tried:
elems = driver.find_elements_by_css_selector('.display-flex card vancouver')
href_output = []
for ele in elems:
href_output.append(ele.get_attribute("href"))
print(href_output)
But the output list returns nothing...
The expected href shown as the image below and I hope the outputs as a list of hrefs:
I really appreciate the help!
Upvotes: 2
Views: 1101
Reputation: 193308
To extract the people's href attribute from the URL https://www.dx3canada.com/agenda/speakers as the the desired elements are within an <iframe>
so you have to:
You can use the following Locator Strategies:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.dx3canada.com/agenda/speakers')
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#whovaIframeSpeaker")))
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.display-flex.card.vancouver")))])
Console Output:
['https://whova.com/embedded/speaker_detail/dcrma_202003/9942778/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907682/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907688/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907676/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907696/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907690/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907670/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907693/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942779/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9908087/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907671/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907681/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907673/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907678/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907689/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907674/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907684/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907685/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907686/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942780/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907695/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907687/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907683/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907692/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907672/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907697/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907680/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907679/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907675/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907677/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907694/']
Here you can find a relevant discussion on Ways to deal with #document under iframe
Upvotes: 5
Reputation: 5909
Your images are in an iframe
, so you will need to switch to this before you can scrape the href
attributes using frame_to_be_available_and_switch_to_it
.
Then, to get the list of all href
attributes, you may need to run some Javascript to scroll the image into view, and handle the case where the images may be lazy loading the href
:
# first, switch to iframe
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='whovaIframeSpeaker']")))
elements_list = driver.find_elements_by_xpath("//div[contains(@class, 'template-section-body')]/a[contains(@class, 'display-flex card vancouver')]")
for element in elements_list:
driver.execute_script("arguments[0].scrollIntoView(true);", element)
print(element.get_attribute("href"))
The results of this code:
Upvotes: 3
Reputation: 1119
For your css selector use .display-flex.card.vancouver instead.
elems = driver.find_elements_by_css_selector('.display-flex.card.vancouver')
Each word is a class, so you need to place a dot in the front of each one.
Upvotes: 0