Reputation: 4122
I am using Python 3 and Selenium to grab some image links from a website as below:
import sys
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
link_xpath = '/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img'
link_path = driver.find_element_by_xpath(link_xpath).text
print(link_path)
driver.quit()
When parsing this URL you can see the image in question in the middle of the page. When you right click in Google Chrome and inspect element, you can then right click the element itself within Chrome Dev Tools and get the xpath for this image.
All looks in order to me, however when running the above code I get the following error:
Traceback (most recent call last):
File "G:\folder\folder\testfilepy", line 16, in <module>
link_path = driver.find_element_by_xpath(link_xpath).text
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img"}
(Session info: headless chrome=83.0.4103.61)
Can anyone tell me why Selenium is unable to find the xpath provided?
Upvotes: 1
Views: 749
Reputation: 7563
You have the correct xpath
, but don't use absolute paths, it's very vulnerable to change. Try this relative xpath
: //div[@class="c-bezel programme-content__image"]//img
.
And to achieve you mean, please use .get_attribute("src")
not .text
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="c-bezel programme-content__image"]//img')))
print(element.get_attribute("src"))
driver.quit()
Or better way, use css selector. This should be faster:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.c-bezel.programme-content__image > img')))
Reference : https://selenium-python.readthedocs.io/locating-elements.html
Upvotes: 1
Reputation: 193108
To extract the src
attribute of the image you need to induce WebDriverWait for the visibility_of_element_located()
and you can use either of the following Locator Strategies:
Using CSS_SELECTOR
:
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--headless')
options.add_argument('--window-size=1920,1080')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.o-layout__item div.c-bezel.programme-content__image>img"))).get_attribute("src"))
Using XPATH
:
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--headless')
options.add_argument('--window-size=1920,1080')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='o-layout__item']//div[@class='c-bezel programme-content__image']/img"))).get_attribute("src"))
Console Output:
https://images.metadata.sky.com/pd-image/251eeec2-acb3-4733-891b-60f10f2cc28c/16-9/640
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a couple of detailed discussion on NoSuchElementException in:
Upvotes: 1
Reputation: 26
Your xpath seems to be correct. You wasn't able to locate because you forgot to handle the cookie. Try it by yourself. Put the driver on hold for few seconds and click agree to all cookies. And then you will see your element. There are multiple way to handle cookie. I was able to locate xpath by using my own xpath which is cleaner. I visit that element from nearest parent.
Hope this help.
Upvotes: 0
Reputation: 3503
If you are working in headless mode, it usually is a good idea to add window size. Add this line to your options:
chrome_options.add_argument('window-size=1920x1080')
Upvotes: 0