l3g10n3
l3g10n3

Reputation: 81

How to get a specific tag attribute text in Python with BeautifulSoup?

I'm coding a little scraper in Python with BS4, in order to get MLB schedule data from ESPN.com

It's almost finished, but I got a little problem here:

snippet

<div class="teams" data-behavior="fix_broken_images"><a name="&amp;lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&amp;h=50" class="schedule-team-logo"></a></div><a name="&amp;lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>

I can actually read the <span> </span> content, but I'd like to get the complete team name in <abbr title>

Don't know what I'm missing, I've not figured out how to do that

Thanks!

Upvotes: 1

Views: 4580

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180502

For your snippet you want the title attribute from the abbr tag inside the anchor with the class team-name :

h = """<div class="teams" data-behavior="fix_broken_images"><a name="&amp;lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&amp;h=50" class="schedule-team-logo"></a></div><a name="&amp;lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>"""


soup = BeautifulSoup(h)

print(soup.select_one("a.team-name abbr")["title"])

Which gives you:

 Kansas City Royals

Or using find:

h = """<div class="teams" data-behavior="fix_broken_images"><a name="&amp;lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&amp;h=50" class="schedule-team-logo"></a></div><a name="&amp;lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>"""

soup = BeautifulSoup(h)

print(soup.find("a", attrs={"class":"team-name"}).abbr["title"])

This will get all the names from the site:

from bs4 import BeautifulSoup
import  requests
url = "http://espn.go.com/mlb/schedule"

soup = BeautifulSoup(requests.get(url).content)

table = soup.select_one("table.schedule.has-team-logos")

print([a["title"] for a in table.select("a.team-name abbr")])

Output:

['Detroit Tigers', 'Washington Nationals', 'Kansas City Royals', 'New York Yankees', 'Oakland Athletics', 'Boston Red Sox', 'Pittsburgh Pirates', 'Cincinnati Reds', 'Milwaukee Brewers', 'Miami Marlins', 'Chicago White Sox', 'Texas Rangers', 'San Diego Padres', 'Chicago Cubs', 'Baltimore Orioles', 'Minnesota Twins', 'Cleveland Indians', 'Houston Astros', 'Arizona Diamondbacks', 'Colorado Rockies', 'Tampa Bay Rays', 'Seattle Mariners', 'New York Mets', 'Los Angeles Dodgers', 'Toronto Blue Jays', 'San Francisco Giants']

Upvotes: 3

Related Questions