SIM
SIM

Reputation: 22440

My scraper throws an error instead of continuing on

I've created a scraper in python in combination with selenium to collect some information from a site. However, the problem I'm facing is after collecting a single lead the scraper throws an error element is not attached to the page document.

Considering the below codes:

  1. There are 20 names where the for loop rolls and the scraper is supposed to click each of them.

  2. After clicking on the first name it waits in the new page for the document to be available.

  3. On that page there is a show more button at the top right corner where it clicks to unwrap the hidden information. (it still stays on the second page, just a new information becomes visible).

  4. As soon as the information shows up the scraper collects that successfully.

  5. Then it is supposed to get back to the starting page where the loop starts and go for the next name to click. But, instead of clicking on the next name it throws the below error (on the line link.click()).

I tried to get rid of the stale element error by using wait.until(EC.staleness_of(item)) but it doesn't work.

for link in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"div.presence-entity__image"))):
    link.click() #error thrown here
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"button[data-control-name='contact_see_more']"))).click()
    item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".pv-contact-info__ci-container a[href^='mailto:']")))
    print(item.get_attribute("href"))
    driver.execute_script("window.history.go(-1)")
    wait.until(EC.staleness_of(item))

Error I'm having:

line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

I tried to depict the picture of what is happening. Any help on this will be highly appreciated.

Upvotes: 3

Views: 89

Answers (1)

Andersson
Andersson

Reputation: 52665

Instead of clicking on each link in a loop, you'd better to collect all links and navigate to all those links in a loop:

links = [link.get_attribute('href') for link in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a.mn-person-info__picture.ember-view")))]
for link in links:
    driver.get(link)
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"button[data-control-name='contact_see_more']"))).click()
    item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".pv-contact-info__ci-container a[href^='mailto:']")))
    print(item.get_attribute("href"))

Note that to get all links you might need to scroll Connections page down to load more connections via XHR

Upvotes: 1

Related Questions