Sabbir Talukdar
Sabbir Talukdar

Reputation: 33

How to scrape a link that is not present when I inspect the element

Hello I am extremely sorry for this long post but I wanted to make sure that the problem is understandable .. I am new to selenium.. from this website: "https://xangle.io/project/list" when I click on any of the following elements it takes me to the new page.clickable elements I want to scrape the links of each of these elements. But the problem is when i inspect those elements looking for URLs I dont find any URls in the html. Here is the screenshot of the html codes: element inspection I took a look at the inspection area of the elements but couldnot find any link, (maybe i have missed it). Any way this is what I tried but i dont think its the correct solution:

driver = webdriver.Chrome(r'C:\Users\User\AppData\Local\Programs\Python\Python37\Lib\site-packages\chromedriver_py\chromedriver_win32.exe')


driver.get('https://xangle.io/project/list')
wait = WebDriverWait(driver, 15)
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[@class='project-table']//div[@class='table-row']//div[3]")))
list_ = driver.find_elements_by_xpath("//div[@class='project-table']//div[@class='table-row']//div[3]")
for i in list_:
    i.click()
    print(driver.current_url)
    driver.back()

It throws an error:

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=80.0.3987.163)

Frankly speaking I dont want to get rid of the error I want to find a correct way of scraping the urls that doesnot show up when inspected

Upvotes: 0

Views: 1218

Answers (2)

Frank
Frank

Reputation: 1285

If you inspect network tab, you can find that those data are from it's API: https://api.xangle.io/project/list?items_per_page=50&page=0

If you take a look at the link in each project, you will see that it's a prefix link and his symbol.

import requests

url = "https://api.xangle.io/project/list?items_per_page=50&page=0"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36'}
r = requests.get(url, headers=headers)
prefix = "https://xangle.io/project/"
data = r.json()
links = [prefix+d["symbol"] for d in data]

Upvotes: 2

Trapli
Trapli

Reputation: 1597

When a page is reloaded, previously found elements turn stale, because the document you are working with is not the same document where the elements were found.

What you could do is change your pattern a bit and do not reuse the list of elements:

driver.get('https://xangle.io/project/list')
wait = WebDriverWait(driver, 15)
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[@class='project-table']//div[@class='table-row']//div[3]")))
list_ = driver.find_elements_by_xpath("//div[@class='project-table']//div[@class='table-row']//div[3]")
names = [x.text for x in list_ if x.text]
for name in names:
    elem = wait.until(EC.element_to_be_clickable((By.XPATH, f'//div[@class="project-table"]//div[@class="table-row"]//div[3]//span[text()="{name}"]/..')))
    elem.click()
    print(driver.current_url)
    driver.back()

Upvotes: 1

Related Questions