Reputation: 199
I am trying to scrape information from a website: https://foreclosures.cabarruscounty.us/. All the data appears to be generated in repeating cards, but I cannot locate the the information when I view the page source. I have tried using a web driver such as Selenium, but even still am not able to see the content I wish to scrape. I'd like to be able to extract all of the repeating data for each entry.
driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)
url = 'https://foreclosures.cabarruscounty.us/'
driver.get(url)
web_url = driver.page_source
soup = bs.BeautifulSoup(web_url, 'html.parser')
print(soup)
How would I be able to access or view the content within the repeating cards themselves?
Upvotes: 0
Views: 1331
Reputation: 195458
The data you see is loaded from external URL, you can use only requests
module to get it:
import json
import requests
url = 'https://foreclosures.cabarruscounty.us/dataForeclosures.json'
data = requests.get(url).json()
# uncomment this to see all data:
# print(json.dumps(data, indent=4)
# print some data to screen:
for d in data:
for k, v in d.items():
print('{:<5}: {}'.format(k, v))
print('-' * 80)
Prints:
ID : 2062
TM : 04-086 -0010.00
S : COMPLAINT/JUDGMENT
C : 20-CVD-1754
R : 56235032510000
T : 14,850
O : W O L INC A NC CORPORATION
M : 3,703
SD : PENDING
ST : PENDING
D : S/S DALE EARNHARDT BLVD
A : ZACCHAEUS LEGAL SVCS
CO : www.zls-nc.com
SL : 77 UNION ST S CONCORD NC 28025
SP : COURTHOUSE STEPS
U : https://foreclosures.cabarruscounty.us/PropertyPhotos/2062.jpg
OR : 3
--------------------------------------------------------------------------------
ID : 2061
TM : 04-007 -0006.00
S : COMPLAINT/JUDGMENT
C : 20-CVD-1070
R : 56036654730000
T : 135,190
O : PITTS H M PITTS H M ESTATE
M : 9,475
SD : PENDING
ST : PENDING
D : SOUTH SIDE MOORESVILLE RD
A : ZACCHAEUS LEGAL SVCS
CO : www.zls-nc.com
SL : 77 UNION ST S CONCORD NC 28025
SP : COURTHOUSE STEPS
U : https://foreclosures.cabarruscounty.us/PropertyPhotos/2061.jpg
OR : 3
--------------------------------------------------------------------------------
...and so on.
Upvotes: 2
Reputation: 3753
Try this:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://foreclosures.cabarruscounty.us/")
all_cards = driver.find_elements_by_xpath("//div[@class='card-body']/div[1]")
for card in all_cards:
print(card.text) #do as you will
The xpath gets the card with the text content. My devtools say there are 174 of them:
Simple process is to get them all then loop through them.
I've done a print but you can do as you will.
This is the output i get: (just the first few as there is a lot)
DevTools listening on ws://127.0.0.1:51331/devtools/browser/555e3584-d777-4c8b-b928-cb8159173533
Real ID: 11-045 -0010.40
Status: UPSET BID PERIOD
Case Number: 18-CVD-2687
Tax Value: $71,500
Min Bid: $9,394
Sale Date: 12/05/2019
Sale Time: 12:00 PM
Owner: PACAJERO REALTY LLC
Attorney: ZACCHAEUS LEGAL SVCS
Real ID: 01-021 -0014.70
Status: UPSET BID PERIOD
Case Number: 16-CVD-3713
Tax Value: $21,360
Min Bid: $5,965
Sale Date: 02/20/2020
Sale Time: 12:00 PM
Owner: HOOKS JOHNNY DALE JR...
Attorney: ZACCHAEUS LEGAL SVCS
Real ID: 11-045 -0017.00
Status: UPSET BID PERIOD
Case Number: 18-CVD-2687
Tax Value: $370,670
Min Bid: $39,187
Sale Date: 12/05/2019
Sale Time: 12:00 PM
Owner: PACAJERO REALTY LLC
Attorney: ZACCHAEUS LEGAL SVCS
Upvotes: 1