David Seroy
David Seroy

Reputation: 199

Web Scraping - Content Not Showing in Page Source

I am trying to scrape information from a website: https://foreclosures.cabarruscounty.us/. All the data appears to be generated in repeating cards, but I cannot locate the the information when I view the page source. I have tried using a web driver such as Selenium, but even still am not able to see the content I wish to scrape. I'd like to be able to extract all of the repeating data for each entry.

driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)

url = 'https://foreclosures.cabarruscounty.us/'

driver.get(url)

web_url = driver.page_source
soup = bs.BeautifulSoup(web_url, 'html.parser')
print(soup)

How would I be able to access or view the content within the repeating cards themselves?

Upvotes: 0

Views: 1331

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195458

The data you see is loaded from external URL, you can use only requests module to get it:

import json
import requests


url = 'https://foreclosures.cabarruscounty.us/dataForeclosures.json'
data = requests.get(url).json()

# uncomment this to see all data:
# print(json.dumps(data, indent=4)

# print some data to screen:
for d in data:
    for k, v in d.items():
        print('{:<5}: {}'.format(k, v))
    print('-' * 80)

Prints:

ID   : 2062
TM   : 04-086 -0010.00
S    : COMPLAINT/JUDGMENT
C    : 20-CVD-1754
R    : 56235032510000
T    : 14,850
O    : W O L INC A NC CORPORATION
M    : 3,703
SD   : PENDING
ST   : PENDING
D    : S/S DALE EARNHARDT BLVD
A    : ZACCHAEUS LEGAL SVCS
CO   : www.zls-nc.com
SL   : 77 UNION ST S CONCORD NC 28025
SP   : COURTHOUSE STEPS
U    : https://foreclosures.cabarruscounty.us/PropertyPhotos/2062.jpg
OR   : 3
--------------------------------------------------------------------------------
ID   : 2061
TM   : 04-007 -0006.00
S    : COMPLAINT/JUDGMENT
C    : 20-CVD-1070
R    : 56036654730000
T    : 135,190
O    : PITTS H M PITTS H M ESTATE
M    : 9,475
SD   : PENDING
ST   : PENDING
D    : SOUTH SIDE MOORESVILLE RD
A    : ZACCHAEUS LEGAL SVCS
CO   : www.zls-nc.com
SL   : 77 UNION ST S CONCORD NC 28025
SP   : COURTHOUSE STEPS
U    : https://foreclosures.cabarruscounty.us/PropertyPhotos/2061.jpg
OR   : 3
--------------------------------------------------------------------------------

...and so on.

Upvotes: 2

RichEdwards
RichEdwards

Reputation: 3753

Try this:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://foreclosures.cabarruscounty.us/")

all_cards = driver.find_elements_by_xpath("//div[@class='card-body']/div[1]")
for card in all_cards:
    print(card.text) #do as you will 

The xpath gets the card with the text content. My devtools say there are 174 of them: devtools

Simple process is to get them all then loop through them.

I've done a print but you can do as you will.

This is the output i get: (just the first few as there is a lot)

DevTools listening on ws://127.0.0.1:51331/devtools/browser/555e3584-d777-4c8b-b928-cb8159173533
Real ID: 11-045 -0010.40
Status: UPSET BID PERIOD
Case Number: 18-CVD-2687
Tax Value: $71,500
Min Bid: $9,394
Sale Date: 12/05/2019
Sale Time: 12:00 PM
Owner: PACAJERO REALTY LLC
Attorney: ZACCHAEUS LEGAL SVCS
Real ID: 01-021 -0014.70
Status: UPSET BID PERIOD
Case Number: 16-CVD-3713
Tax Value: $21,360
Min Bid: $5,965
Sale Date: 02/20/2020
Sale Time: 12:00 PM
Owner: HOOKS JOHNNY DALE JR...
Attorney: ZACCHAEUS LEGAL SVCS
Real ID: 11-045 -0017.00
Status: UPSET BID PERIOD
Case Number: 18-CVD-2687
Tax Value: $370,670
Min Bid: $39,187
Sale Date: 12/05/2019
Sale Time: 12:00 PM
Owner: PACAJERO REALTY LLC
Attorney: ZACCHAEUS LEGAL SVCS

Upvotes: 1

Related Questions