David Seroy
David Seroy

Reputation: 199

Selenium - Cannot locate elements in page source

I am trying to crawl a web page using Selenium, but for some reason the elements I need are not showing up in the page source

I've tried using a WebDriverWait until the page loads. I've also tried to see if the data is in a different frame that I need to switch to.

driver.get('https://foreclosures.cabarruscounty.us/')

try:
    WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH,'//*[@id="app"]/div[5]/div/div')))
    print("Page is ready!")

    web_url = driver.page_source
    print(web_url)

except TimeoutException:
    print("Loading took too much time!")

I would expect to see all of the records for each individual property card that I could then extract. However, the page source does not show any of this data.

If I manually load the web page and inspect the source, the data just does not exist view-source:https://foreclosures.cabarruscounty.us/

Upvotes: 1

Views: 546

Answers (3)

undetected Selenium
undetected Selenium

Reputation: 193348

To extract the first Real ID, Case Number and Owner field you have to induce WebDriverWait for the visibility_of_element_located() and you can use the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument("start-maximized")
    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://foreclosures.cabarruscounty.us/");
    Real_ID = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div/b"))).text
    Case_Number = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div//following-sibling::b[2]"))).text
    Owner = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div//following-sibling::b[7]"))).text
    print("{} is {} owned by {}".format(Real_ID,Case_Number,Owner))
    driver.quit()
    
  • Console Output:

    Real ID: 04-086 -0040.00 is Case Number: 18-CVD-2804 owned by Owner: DOUGLAS JAMES W
    

Upvotes: 1

KunduK
KunduK

Reputation: 33384

Try the below code.It would return all the elements.Use visibility_of_all_elements_located()

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver

driver=webdriver.Chrome()
driver.get("https://foreclosures.cabarruscounty.us/")
elements=WebDriverWait(driver,30).until(EC.visibility_of_all_elements_located((By.XPATH,"//div[@id='app']//div[@class='card-body']/div[1]")))
allrecord=[ele.text for ele in elements]
print(allrecord) #it will give you all record.

If you just print 1st element value.

print(allrecord[0].splitlines())

You will get below output:

['Real ID: 04-086 -0040.00', 'Status: SALE SCHEDULED', 'Case Number: 18-CVD-2804', 'Tax Value: $29,660', 'Min Bid: $10,067', 'Sale Date: 10/03/2019', 'Sale Time: 12:00 PM', 'Owner: DOUGLAS JAMES W', 'Attorney: ZACCHAEUS LEGAL SVCS']

Upvotes: 1

you can use an ImplicitWait and PageLoad for waiting for the elements:

//For 30 seconds
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(30);
driver.Manage().Timeouts().PageLoad = TimeSpan.FromSeconds(30);

This code is for C# and Selenium

Upvotes: 0

Related Questions