Reputation: 4807
I am using the following code:
import requests, pandas as pd
from bs4 import BeautifulSoup
if __name__ == '__main__':
url = 'https://www.har.com/homedetail/6408-burgoyne-rd-157-houston-tx-77057/3380601'
list_of_dataframes = pd.read_html(url)
However, in the list_of_dataframes
there is no school information which is available at the bottom of the page in the url above.
I was wondering how to get the following information in a dataframe as below:
School Stars Rating
BRIARGROVE Elementary School 4 Good
TANGLEWOOD Middle School 4 Good
WISDOM High School High 3 Average
TIA
Upvotes: 2
Views: 275
Reputation: 20042
You can't get that school info with pandas
because this is not a table. These are just regular divs
so you have to parse the HTML
and then dump the data to pd.DataFrame
.
Here's how to do it:
import pandas as pd
import requests
from bs4 import BeautifulSoup
if __name__ == '__main__':
url = 'https://www.har.com/homedetail/6408-burgoyne-rd-157-houston-tx-77057/3380601'
soup = BeautifulSoup(requests.get(url).text, "lxml").find("div", {"id": "SCHOOLS"})
schools = soup.find_all("div", class_="border_row")
schools_data = []
for school in schools:
name = school.find("a").getText()
stars = len([i for i in school.find_all("img") if "star" in i["src"]])
rating = school.getText().split()[-2]
schools_data.append(
[
name,
stars,
rating,
]
)
print(pd.DataFrame(schools_data, columns=["School", "Stars", "Rating"]))
Output:
School Stars Rating
0 BRIARGROVE Elementary School 4 Good
1 TANGLEWOOD Middle School 4 Good
2 WISDOM High School 3 Average
Upvotes: 4