ar.dll
ar.dll

Reputation: 787

Get link text from HTML using beautifulsoup

From the following example section of HTML I am pulling a bunch of football scores from a page using beautifilsoup, easy peasy:

<tr class='report' id='match-row-EFBO695086'> <td class='statistics show' title='Show latest      match stats'> <button>Show</button> </td>  <td class='match-competition'> Premier League  </td>  <td class='match-details
teams'> <p> <span class='team-home teams'> <a href='/sport/football/teams/manchester-city'>Man City</a> </span>   <span class='score'> <abbr title='Score'> 1-0 </abbr> </span>   <span class='team-away teams'> <a
href='/sport/football/teams/crystal-palace'>Crystal Palace</a> </span>   </p> </td> <td class="match-date"> Sat 28 Dec </td>   <td class='time'>  Full time  </td>   <td class='status'>    <a class='report'
href='/sport/football/25474625'>Report</a>

from bs4 import BeautifulSoup
import urllib.request
import csv

url = 'http://www.bbc.co.uk/sport/football/teams/manchester-city/results/'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page)

for score in soup.findAll('abbr'):
    print(score.string)

*** Remote Interpreter Reinitialized  ***
>>> 
None
1-2 
1-0 
0-2 
2-1 
2-2 
4-1 
0-2 
1-1 

How do I extract the team names from this part of the HTML:

<span class='team-away teams'> <a href='/sport/football/teams/crystal-palace'>Crystal Palace</a>    </span> 

Upvotes: 3

Views: 1310

Answers (1)

alecxe
alecxe

Reputation: 474191

The idea is to first get the elements containing information about an each game - these are tr tags with a class="report". For each row get the team names by class team-home and team-away and score by the tag name abbr:

from bs4 import BeautifulSoup
import urllib.request

url = 'http://www.bbc.co.uk/sport/football/teams/manchester-city/results/'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page)

for match in soup.select('table.table-stats tr.report'):
    team1 = match.find('span', class_='team-home')
    team2 = match.find('span', class_='team-away')
    score = match.abbr
    if not all((team1, team2, score)):
        continue

    print(team1.text, score.text, team2.text)

Prints:

Man City   1-2   CSKA 
Man City   1-0   Man Utd 
Man City   0-2   Newcastle 
West Ham   2-1   Man City 
...

FYI, table.table-stats tr.report is a CSS Selector that matches all tr tags with class="report" inside the table with class="table-stats".

Upvotes: 2

Related Questions