Beautiful Soup with California Senator webpage

Question

I am new to Beautiful Soup and to HTML and after following a tutorial, am trying to scrape this webpage with California Senators. https://www.senate.ca.gov/senators My goal is to extract senators' name, party affiliation, district and capitol office phone number and ultimately put it into a pandas DataFrame. I looked at the source code, and see h3 is a tag that will be important for name/party, that address/phone is tagged with p. If I find all rows with "h3", I get 201-- more than the number of senators. I don't quite know how to drill down on just what I want to extract. I can do the request and soup it, but am not quite sure how to extract the info I need. Any help would be appreciated. I have followed a few online tutorials, but they don't cover all cases.

Latest try: import requests from bs4 import BeautifulSoup import pandas as pd

# Send a GET request to the website
url = "https://www.senate.ca.gov/senators"
response = requests.get(url)

# Use Beautiful Soup to parse the HTML   
soup = BeautifulSoup(response.content, "html.parser")

# Find the table that contains the senator    information
table = soup.find("table", {"class": "views-  table cols-4"})

# Create lists to store the data
names = []
districts = []
parties = []
phones = []

# Extract the senator information from each row in the table
for row in table.find_all("tr"):
     cells = row.find_all("td")
     if len(cells) == 4:
         name = cells[0].get_text().strip()
         district =   cells[1].get_text().strip()
         party = cells[2].get_text().strip()
         phone = cells[3].get_text().strip()
    
    # Append the data to the lists
    names.append(name)
    districts.append(district)
    parties.append(party)
    phones.append(phone)

 # Create a Pandas dataframe from the lists
 df = pd.DataFrame({"Senator Name": names,  "District": districts, "Party": parties, "Phone Number": phones})

# Print the dataframe
 print(df)

Beautiful Soup with California Senator webpage

Answers (1)

Related Questions