taga
taga

Reputation: 3895

Iterate and collect data over website pages with Selenium and Python

I want to collect data from website pages with Python and Selenium. Website is news website, I have come to the page where links/different news articles are listed. This is my code:

# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a') 
print(len(all_links)) # I got 10 different articles

for element in all_links:

    print(element.get_attribute('outerHTML')) # if I print only this, I get 10 different HTML-s
        
    link = element.click()# clicking on the link to go to specific page
    time.sleep(1)
    
    # DATES
    date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
    print(date)

    #until now everything words, everything works for the first element

But I'm getting the error when I want to iterate trough second element. So, I'm getting good results for the first element in the list, but then I get this:

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=92.0.4515.159)

I have tried to put time.sleep(4) pauses and to add driver.close() and to add driver.back() after each iteration but the error is the same.

What am I doing wrong?

Upvotes: 2

Views: 96

Answers (2)

cruisepandey
cruisepandey

Reputation: 29362

You need to define the list of web elements once again, when you are inside the for loop.

Explanation :

See the exact problem here is, when you click on the first element, it will go that first page where you have the element, and when you come back using
driver.execute_script("window.history.go(-1)") the other elements becomes stale in nature (This is how selenium works), so we have to redefined them again in order to interact with them. Please see below for illustration :-

# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a')
print(len(all_links))  # I got 10 different articles
j = 0
for element in range(len(all_links)):
  elements = driver.find_elements_by_tag_name('article.post a')
  print(elements[j].get_attribute('outerHTML'))  # if I print only this, I get 10 different HTML-s

  elements[j].click()  # clicking on the link to go to specific page
  time.sleep(1)

  # DATES
  date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
  print(date)
  time.sleep(1)
  driver.execute_script("window.history.go(-1)")
  # code to go back to previous page should be written here, something like, driver.execute_script("window.history.go(-1)") or if this works driver.back()
  time.sleep(1) 
  j = j + 1

Upvotes: 3

Prophet
Prophet

Reputation: 33361

You are facing here with classic case of StaleElementReferenceException.
Initially you have picked a list of elements with

all_links = driver.find_elements_by_tag_name('article.post a')

But once you click the first link and being passed to another page previously picked references (pointers) to the web elements located on the initial web page become Stale since these elements no more presented on the new page.
So even if you will get back to the initial page these references are no more valid since they become stale.
To continue you will have to get the links again.
You can do this as following:

# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a')
print(len(all_links))  # I got 10 different articles
i = 0
for element in range(len(all_links)):
    #get all the elements again
    elements = driver.find_elements_by_tag_name('article.post a')
    #get the i-th element from list and click it
    link = elements[i].click()  # clicking on the link to go to specific page
    time.sleep(1)

    # DATES
    date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
    print(date)
    #get back to the previous page
    driver.execute_script("window.history.go(-1)")
    time.sleep(1) 
    #increase the counter
    i = i + 1

Upvotes: 3

Related Questions