Reputation: 199
I am trying to crawl a web page using Selenium, but for some reason the elements I need are not showing up in the page source
I've tried using a WebDriverWait until the page loads. I've also tried to see if the data is in a different frame that I need to switch to.
driver.get('https://foreclosures.cabarruscounty.us/')
try:
WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH,'//*[@id="app"]/div[5]/div/div')))
print("Page is ready!")
web_url = driver.page_source
print(web_url)
except TimeoutException:
print("Loading took too much time!")
I would expect to see all of the records for each individual property card that I could then extract. However, the page source does not show any of this data.
If I manually load the web page and inspect the source, the data just does not exist view-source:https://foreclosures.cabarruscounty.us/
Upvotes: 1
Views: 546
Reputation: 193348
To extract the first Real ID, Case Number and Owner field you have to induce WebDriverWait for the visibility_of_element_located()
and you can use the following Locator Strategies:
Code Block:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://foreclosures.cabarruscounty.us/");
Real_ID = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div/b"))).text
Case_Number = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div//following-sibling::b[2]"))).text
Owner = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='row']//div[@class='card cardClass']/img//following::div[@class='card-body']//div//following-sibling::b[7]"))).text
print("{} is {} owned by {}".format(Real_ID,Case_Number,Owner))
driver.quit()
Console Output:
Real ID: 04-086 -0040.00 is Case Number: 18-CVD-2804 owned by Owner: DOUGLAS JAMES W
Upvotes: 1
Reputation: 33384
Try the below code.It would return all the elements.Use visibility_of_all_elements_located
()
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
driver=webdriver.Chrome()
driver.get("https://foreclosures.cabarruscounty.us/")
elements=WebDriverWait(driver,30).until(EC.visibility_of_all_elements_located((By.XPATH,"//div[@id='app']//div[@class='card-body']/div[1]")))
allrecord=[ele.text for ele in elements]
print(allrecord) #it will give you all record.
If you just print 1st element value.
print(allrecord[0].splitlines())
You will get below output:
['Real ID: 04-086 -0040.00', 'Status: SALE SCHEDULED', 'Case Number: 18-CVD-2804', 'Tax Value: $29,660', 'Min Bid: $10,067', 'Sale Date: 10/03/2019', 'Sale Time: 12:00 PM', 'Owner: DOUGLAS JAMES W', 'Attorney: ZACCHAEUS LEGAL SVCS']
Upvotes: 1
Reputation: 36
you can use an ImplicitWait and PageLoad for waiting for the elements:
//For 30 seconds
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(30);
driver.Manage().Timeouts().PageLoad = TimeSpan.FromSeconds(30);
This code is for C# and Selenium
Upvotes: 0