Reputation: 787
From the following example section of HTML I am pulling a bunch of football scores from a page using beautifilsoup, easy peasy:
<tr class='report' id='match-row-EFBO695086'> <td class='statistics show' title='Show latest match stats'> <button>Show</button> </td> <td class='match-competition'> Premier League </td> <td class='match-details
teams'> <p> <span class='team-home teams'> <a href='/sport/football/teams/manchester-city'>Man City</a> </span> <span class='score'> <abbr title='Score'> 1-0 </abbr> </span> <span class='team-away teams'> <a
href='/sport/football/teams/crystal-palace'>Crystal Palace</a> </span> </p> </td> <td class="match-date"> Sat 28 Dec </td> <td class='time'> Full time </td> <td class='status'> <a class='report'
href='/sport/football/25474625'>Report</a>
from bs4 import BeautifulSoup
import urllib.request
import csv
url = 'http://www.bbc.co.uk/sport/football/teams/manchester-city/results/'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page)
for score in soup.findAll('abbr'):
print(score.string)
*** Remote Interpreter Reinitialized ***
>>>
None
1-2
1-0
0-2
2-1
2-2
4-1
0-2
1-1
How do I extract the team names from this part of the HTML:
<span class='team-away teams'> <a href='/sport/football/teams/crystal-palace'>Crystal Palace</a> </span>
Upvotes: 3
Views: 1310
Reputation: 474191
The idea is to first get the elements containing information about an each game - these are tr
tags with a class="report"
. For each row get the team names by class team-home
and team-away
and score by the tag name abbr
:
from bs4 import BeautifulSoup
import urllib.request
url = 'http://www.bbc.co.uk/sport/football/teams/manchester-city/results/'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page)
for match in soup.select('table.table-stats tr.report'):
team1 = match.find('span', class_='team-home')
team2 = match.find('span', class_='team-away')
score = match.abbr
if not all((team1, team2, score)):
continue
print(team1.text, score.text, team2.text)
Prints:
Man City 1-2 CSKA
Man City 1-0 Man Utd
Man City 0-2 Newcastle
West Ham 2-1 Man City
...
FYI, table.table-stats tr.report
is a CSS Selector that matches all tr
tags with class="report"
inside the table
with class="table-stats"
.
Upvotes: 2