Reputation: 121
I am trying to web scrape a website to get information about soccer matches. Therefore I'm using the Selenium library in Python.
I stored clickable html elements from all the needed matches in a list called "completed_matches". I created a for loop which iterates trough all these clickable html elements. Inside the loop I click on the current html element and print the new URL. The code looks like this:
from selenium import webdriver
import selenium
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome(r"C:\Users\Mart\Downloads\chromedriver_win32_2\chromedriver.exe")
url = "https://footystats.org/spain/la-liga/matches"
driver.get(url)
completed_matches = driver.find_elements_by_xpath("""//*[@id="matches-list"]/div[@class='full-matches-table mt2e ' or @class='full-matches-table mt1e ']/div/div[2]/table[@class='matches-table inactive-matches']/tbody/tr[*]/td[3]/a[1]/span""");
print(len(completed_matches))
for match in completed_matches:
match.click()
print("Current driver URL: " + driver.current_url)
The output looks like this:
159
Current driver URL: https://footystats.org/spain/fc-barcelona-vs-real-club-deportivo-mallorca-h2h-stats#632514
---------------------------------------------------------------------------
StaleElementReferenceException Traceback (most recent call last)
<ipython-input-3-da5851d767a8> in <module>
4 print(len(completed_matches))
5 for match in completed_matches:
----> 6 match.click()
7 print("Current driver URL: " + driver.current_url)
~\Anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in click(self)
78 def click(self):
79 """Clicks the element."""
---> 80 self._execute(Command.CLICK_ELEMENT)
81
82 def submit(self):
~\Anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in _execute(self, command, params)
631 params = {}
632 params['id'] = self._id
--> 633 return self._parent.execute(command, params)
634
635 def find_element(self, by=By.ID, value=None):
~\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
319 response = self.command_executor.execute(driver_command, params)
320 if response:
--> 321 self.error_handler.check_response(response)
322 response['value'] = self._unwrap_value(
323 response.get('value', None))
~\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
240 alert_text = value['alert'].get('text')
241 raise exception_class(message, screen, stacktrace, alert_text)
--> 242 raise exception_class(message, screen, stacktrace)
243
244 def _value_or_default(self, obj, key, default):
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=79.0.3945.79)
(Driver info: chromedriver=72.0.3626.7 (efcef9a3ecda02b2132af215116a03852d08b9cb),platform=Windows NT 10.0.18362 x86_64)
The completed_matches list contains 159 html elements, but the for loop only shows the first clicked link and then throws the StaleElementReferenceException...
Does anyone know how to solve this problem?
Upvotes: 2
Views: 1963
Reputation: 213
Stale means old, decayed, no longer fresh. Stale Element means an old element or no longer available element. Assume there is an element that is found on a web page referenced as a WebElement in WebDriver. If the DOM changes then the WebElement goes stale.
So that means the page that you are working is changing after clicking on the element so here is my suggestion to fix it :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
while True:
try:
completed_match = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH, "//*[@id="matches-list"]/div[@class='full-matches-table mt2e ")))
except TimeoutException:
break
completed_match.click()
time.sleep(2)
So just iterate through elements and update it every time, in this case it will be in page's DOM for sure
You can check web scraper for Trip advisor with full details code here:
https://github.com/alirezaznz/Tripadvisor-Webscraper
Upvotes: 0
Reputation: 585
After the click the DOM is getting refreshed, hence the StaleElementReferenceException. So inside the for loop build the completed_matches element again.
completed_matches = driver.find_elements_by_xpath("""//*[@id="matches-list"]/div[@class='full-matches-table mt2e ' or @class='full-matches-table mt1e ']/div/div[2]/table[@class='matches-table inactive-matches']/tbody/tr[*]/td[3]/a[1]/span""");
print(len(completed_matches))
for match in completed_matches:
completed_matches = driver.find_elements_by_xpath("""//*[@id="matches-list"]/div[@class='full-matches-table mt2e ' or @class='full-matches-table mt1e ']/div/div[2]/table[@class='matches-table inactive-matches']/tbody/tr[*]/td[3]/a[1]/span""");
match.click()
Upvotes: 0
Reputation: 3790
The url you are looking for is in the link you are clicking. The parent element you are are selecting to click. The StaleElementReferenceException is because after you click the link the page changes rendering all the elements after the first one that was clicked stale.
from selenium import webdriver
import selenium
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome(r"C:\Users\Mart\Downloads\chromedriver_win32_2\chromedriver.exe")
url = "https://footystats.org/spain/la-liga/matches"
driver.get(url)
completed_matches = driver.find_elements_by_xpath("""//*[@id="matches-list"]/div[@class='full-matches-table mt2e ' or @class='full-matches-table mt1e ']/div/div[2]/table[@class='matches-table inactive-matches']/tbody/tr[*]/td[3]/a[1]/span""");
print(len(completed_matches))
for match in completed_matches:
#match.click()
#print("Current driver URL: " + driver.current_url)
match_parent = match.find_element_by_xpath("..")
href = match_parent.get_attribute("href")
print("href: ", href)
Upvotes: 2