Jordan Garcia
Jordan Garcia

Reputation: 13

Can't collect links from website (Python)

I'm writing a program in Python to collect links from a website. The code is:

links = driver.find_elements_by_xpath('//*[@href]')
for link in links:
     print(link.get_attribute('href'))
time.sleep(1)

I tried it in somes sites and it worked well. The problems is when I use in a specific site (www.ifood.com.br). It collect some links and then return some errors. I'm beginner in Python, so I don't know what they mean. Please, I need some help.

The result of the code:

https://d1jgln4w9al398.cloudfront.net/imagens/ce/wl/www.ifood.com.br/favicon.ico https://d1jgln4w9al398.cloudfront.net/site/2.1.238-20181023.22/css/main.css https://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800 https://www.ifood.com.br/

Traceback (most recent call last): File "C:\Users\jorda\Desktop\Python - Projetos\digitar ifood.py", line 32, in print(link.get_attribute('href')) File "C:\Users\jorda\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 143, in get_attribute resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name}) File "C:\Users\jorda\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 633, in _execute return self._parent.execute(command, params) File "C:\Users\jorda\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Users\jorda\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document (Session info: chrome=70.0.3538.77) (Driver info: chromedriver=2.42.591088 (7b2b2dca23cca0862f674758c9a3933e685c27d5),platform=Windows NT 10.0.17134 x86_64)

Upvotes: 1

Views: 163

Answers (1)

Vladimir Efimov
Vladimir Efimov

Reputation: 799

In your error log you can see

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

Generally, it happens when you are trying to interact with a web element that no longer exists in DOM. A typical scenario can be described as

  1. You opened a web page.
  2. Find some elements and saved them to a variable.
  3. Page DOM is changed (for example reloaded).
  4. You still see the same page but elements from step 2 are STALE from selenium perspective.

So in your case, you can try to make sure the page is fully loaded (i.e. does not restore DOM) before calling .findElements. The simplest way to check if this will solve your problem is to add a sleep before calling .findElements.

time.sleep(5)
links = driver.find_elements_by_xpath('//*[@href]')
for link in links:
     print(link.get_attribute('href'))

Please note that sleeps are not recommended to use. Because for example if 5 seconds works, for now, there is no guarantee that at some point (because of poor connection) it will not break your test. Instead, use a smart wait condition that will repetitively check for 'page loaded' condition and continue only when it happens. More details could be found here: Python Selenium stale element fix

Upvotes: 1

Related Questions