How to scrape data from html table in python

Question


DEED

2016002023
 Recording Date: 01/12/2016 08:05:17 AM   Book Page:  Grantor: ARELLANO ISAIAS Grantee: ARELLANO ISAIAS, ARELLANO ALICIA
Number Pages: 3

I am new to python and scraping please help me how to scrape data from this table. For login go to public login and then enter the to and from dates.

Data Model: The data model has columns in this specific order and casing: “record_date”, “doc_number”, “doc_type”, “role”, “name”, “apn”, "transfer_amount", “county”, and “state”. The “role” column will either be “Grantor” or “Grantee”, depending on where the name is assigned. If there are multiple names for grantors and grantees, give each name a new line and copy the recording date, document number, document type, role, and apn.

https://crarecords.sonomacounty.ca.gov/recorder/eagleweb/docSearchResults.jsp?searchId=0

Everett · Accepted Answer

I know this is an old question, but one underrated secret for this task is Panda's read_clipboard function: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_clipboard.html

I think it's using BeautifulSoup behind the scenes, but the interface for simple usage is very straightforward. Consider this simple script:

# 1. Go to a website, e.g. https://www.wunderground.com/hurricane/hurrarchive.asp?region=ep
# 2. Highlight the table of data, e.g. of Hurricanes in the East Pacific
# 3. Copy the text from your browser
# 4. Run this script: the data will be available as a dataframe
import pandas as pd
df = pd.read_clipboard()
print(df)

Granted, this solution requires user interaction, but for a lot of cases, I've found it useful when there is no convenient CSV download or API endpoint.

How to scrape data from html table in python

Answers (2)

Related Questions