Reputation: 3895
I want to collect data from website pages with Python and Selenium. Website is news website, I have come to the page where links/different news articles are listed. This is my code:
# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a')
print(len(all_links)) # I got 10 different articles
for element in all_links:
print(element.get_attribute('outerHTML')) # if I print only this, I get 10 different HTML-s
link = element.click()# clicking on the link to go to specific page
time.sleep(1)
# DATES
date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
print(date)
#until now everything words, everything works for the first element
But I'm getting the error when I want to iterate trough second element. So, I'm getting good results for the first element in the list, but then I get this:
StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=92.0.4515.159)
I have tried to put time.sleep(4)
pauses and to add driver.close()
and to add driver.back()
after each iteration but the error is the same.
What am I doing wrong?
Upvotes: 2
Views: 96
Reputation: 29362
You need to define the list of web elements once again, when you are inside the for loop.
Explanation :
See the exact problem here is, when you click on the first element, it will go that first page where you have the element, and when you come back using
driver.execute_script("window.history.go(-1)")
the other elements becomes stale in nature (This is how selenium works), so we have to redefined them again in order to interact with them. Please see below for illustration :-
# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a')
print(len(all_links)) # I got 10 different articles
j = 0
for element in range(len(all_links)):
elements = driver.find_elements_by_tag_name('article.post a')
print(elements[j].get_attribute('outerHTML')) # if I print only this, I get 10 different HTML-s
elements[j].click() # clicking on the link to go to specific page
time.sleep(1)
# DATES
date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
print(date)
time.sleep(1)
driver.execute_script("window.history.go(-1)")
# code to go back to previous page should be written here, something like, driver.execute_script("window.history.go(-1)") or if this works driver.back()
time.sleep(1)
j = j + 1
Upvotes: 3
Reputation: 33361
You are facing here with classic case of StaleElementReferenceException
.
Initially you have picked a list of elements with
all_links = driver.find_elements_by_tag_name('article.post a')
But once you click the first link and being passed to another page previously picked references (pointers) to the web elements located on the initial web page become Stale since these elements no more presented on the new page.
So even if you will get back to the initial page these references are no more valid since they become stale.
To continue you will have to get the links again.
You can do this as following:
# finding list of news articles
all_links = driver.find_elements_by_tag_name('article.post a')
print(len(all_links)) # I got 10 different articles
i = 0
for element in range(len(all_links)):
#get all the elements again
elements = driver.find_elements_by_tag_name('article.post a')
#get the i-th element from list and click it
link = elements[i].click() # clicking on the link to go to specific page
time.sleep(1)
# DATES
date = driver.find_element_by_tag_name('article header span.no-break-text.lite').text
print(date)
#get back to the previous page
driver.execute_script("window.history.go(-1)")
time.sleep(1)
#increase the counter
i = i + 1
Upvotes: 3