Iterate and collect data over website pages with Selenium and Python

Question

I want to collect data from website pages with Python and Selenium. Website is news website, I have come to the page where links/different news articles are listed. This is my code:

# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a') 
print(len(all_links)) # I got 10 different articles

for element in all_links:

    print(element.get_attribute('outerHTML')) # if I print only this, I get 10 different HTML-s
        
    link = element.click()# clicking on the link to go to specific page
    time.sleep(1)
    
    # DATES
    date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
    print(date)

    #until now everything words, everything works for the first element

But I'm getting the error when I want to iterate trough second element. So, I'm getting good results for the first element in the list, but then I get this:

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=92.0.4515.159)

I have tried to put time.sleep(4) pauses and to add driver.close() and to add driver.back() after each iteration but the error is the same.

What am I doing wrong?

cruisepandey · Accepted Answer

You need to define the list of web elements once again, when you are inside the for loop.

Explanation :

See the exact problem here is, when you click on the first element, it will go that first page where you have the element, and when you come back using
driver.execute_script("window.history.go(-1)") the other elements becomes stale in nature (This is how selenium works), so we have to redefined them again in order to interact with them. Please see below for illustration :-

# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a')
print(len(all_links))  # I got 10 different articles
j = 0
for element in range(len(all_links)):
  elements = driver.find_elements_by_tag_name('article.post a')
  print(elements[j].get_attribute('outerHTML'))  # if I print only this, I get 10 different HTML-s

  elements[j].click()  # clicking on the link to go to specific page
  time.sleep(1)

  # DATES
  date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
  print(date)
  time.sleep(1)
  driver.execute_script("window.history.go(-1)")
  # code to go back to previous page should be written here, something like, driver.execute_script("window.history.go(-1)") or if this works driver.back()
  time.sleep(1) 
  j = j + 1

Iterate and collect data over website pages with Selenium and Python

Answers (2)

Related Questions