Reputation: 159
I'm wanting to extract the FIPS code for each county in Louisiana from this website using beautiful soup and create a Pandas Dataframe: https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697
The columns would be FIPS, Name, and State. I've tried finding by tr, td, and table when I inspect the element, but I don't know how to single out just the main data and then put it into a pandas dataframe. Once I find the specific table, it should be easy to do something like:
if state == 'LA':
# put data into a dataframe
import requests
from bs4 import BeautifulSoup
url = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'html.parser')
# print(soup)
for county in soup.find_all('table'):
print(county.text)
Upvotes: 1
Views: 7242
Reputation: 23738
There is one table so can iterate over the <tr>
elements in that one table.
If want a data frame to include only one particular state then can filter it before adding to a data frame, or filter the data frame of all data for a subset data frame.
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'html.parser')
data = []
for tr in soup.find('table', class_='data').find_all('tr'):
row = [td.text for td in tr.find_all('td')]
# If want to filter out all except LA then can do that here
if len(row) == 3 and row[2] == 'LA':
data.append(row)
df = pd.DataFrame(data, columns=['FIPS', 'Name', 'State'])
print(df)
Output:
FIPS Name State
0 22001 Acadia LA
1 22003 Allen LA
2 22005 Ascension LA
3 22007 Assumption LA
4 22009 Avoyelles LA
.. ... ... ...
63 22127 Winn LA
Upvotes: 2
Reputation: 195408
You can select <table>
with class="data"
and then use pd.read_html
. For example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
df = pd.read_html(str(soup.select_one(".data")))[0]
# filter State == 'LA'
print(df[df.State == "LA"].head())
Prints:
FIPS Name State
1109 22001 Acadia LA
1110 22003 Allen LA
1111 22005 Ascension LA
1112 22007 Assumption LA
1113 22009 Avoyelles LA
Upvotes: 2