Reputation: 22440
I've created a scraper in python in combination with selenium to collect some information from a site. However, the problem I'm facing is after collecting a single lead the scraper throws an error element is not attached to the page document
.
Considering the below codes:
There are 20 names where the for loop
rolls and the scraper is supposed to click each of them.
After clicking on the first name it waits in the new page for the document to be available.
On that page there is a show more button at the top right corner where it clicks to unwrap the hidden information. (it still stays on the second page, just a new information becomes visible).
As soon as the information shows up the scraper collects that successfully.
Then it is supposed to get back to the starting page where the loop starts and go for the next name to click. But, instead of clicking on the next name it throws the below error (on the line link.click()
).
I tried to get rid of the stale element error by using wait.until(EC.staleness_of(item))
but it doesn't work.
for link in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"div.presence-entity__image"))):
link.click() #error thrown here
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"button[data-control-name='contact_see_more']"))).click()
item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".pv-contact-info__ci-container a[href^='mailto:']")))
print(item.get_attribute("href"))
driver.execute_script("window.history.go(-1)")
wait.until(EC.staleness_of(item))
Error I'm having:
line 194, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
I tried to depict the picture of what is happening. Any help on this will be highly appreciated.
Upvotes: 3
Views: 89
Reputation: 52665
Instead of clicking on each link in a loop, you'd better to collect all links and navigate to all those links in a loop:
links = [link.get_attribute('href') for link in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a.mn-person-info__picture.ember-view")))]
for link in links:
driver.get(link)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"button[data-control-name='contact_see_more']"))).click()
item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".pv-contact-info__ci-container a[href^='mailto:']")))
print(item.get_attribute("href"))
Note that to get all links you might need to scroll Connections page down to load more connections via XHR
Upvotes: 1