Reputation: 311
I'm trying to get the "data-val" from my soup, but they all come in a huge list and not formatted in different lists/columns as show in the website.
I know the headers are here:
<th class="num record drop-3" data-tsorter="data-val">
<span class="long-points">
proj. pts.
</span>
<span class="short-points">
pts.
</span>
</th>
<th class="pct" data-tsorter="data-val">
<span class="full-relegated">
relegated
</span>
<span class="small-relegated">
rel.
</span>
</th>
<th class="pct" data-tsorter="data-val">
<span class="full-champ">
qualify for UCL
</span>
<span class="small-champ">
make UCL
</span>
</th>
<th class="pct sorted" data-tsorter="data-val">
<span class="drop-1">
win Premier League
</span>
<span class="small-league">
win league
</span>
</th>
This is what I'm trying:
url = 'https://projects.fivethirtyeight.com/soccer-predictions/premier-league/'
r = requests.get(url = url)
soup = BeautifulSoup(r.text, "html.parser")
table = soup.find("table", {"class":"forecast-table"})
#print(table.prettify())
for i in table.find_all("td", {"class":"pct"}):
print(i)
So ideally I'd like 4 lists, with the class names and then the matching values
Upvotes: 1
Views: 192
Reputation: 3662
Not entirely sure what specific cols you want but this gets all the ones with a data-val
in the tag's attributes:
import requests
from bs4 import BeautifulSoup
url = 'https://projects.fivethirtyeight.com/soccer-predictions/premier-league/'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
table = soup.find("table", {"class": "forecast-table"})
team_rows = table.find_all("tr", {"class": "team-row"})
for team in team_rows:
print("Team name: {}".format(team['data-str']))
team_data = team.find_all("td")
for data in team_data:
if hasattr(data, 'attrs') and 'data-val' in data.attrs:
print("\t{}".format(data.attrs['data-val']))
print("\n")
If I do understand your question correctly, you're looking for the last couple of values, which are fairly untagged in the html source. When that's the case, you can try simply looking for tag[6]
, although it's of course not very robust - but this is html parsing, so "not very robust" is par for the course imho.
what I'm doing here is finding all the team rows (which is easy thanks to the class name), and then simply looping through all the td
tags that are in the team rows' tr
.
Upvotes: 2