Reputation: 81
I'm coding a little scraper in Python with BS4, in order to get MLB schedule data from ESPN.com
It's almost finished, but I got a little problem here:
<div class="teams" data-behavior="fix_broken_images"><a name="&lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&h=50" class="schedule-team-logo"></a></div><a name="&lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>
I can actually read the <span> </span>
content, but I'd like to get the complete team name in <abbr title>
Don't know what I'm missing, I've not figured out how to do that
Thanks!
Upvotes: 1
Views: 4580
Reputation: 180502
For your snippet you want the title attribute from the abbr tag inside the anchor with the class team-name
:
h = """<div class="teams" data-behavior="fix_broken_images"><a name="&lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&h=50" class="schedule-team-logo"></a></div><a name="&lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>"""
soup = BeautifulSoup(h)
print(soup.select_one("a.team-name abbr")["title"])
Which gives you:
Kansas City Royals
Or using find:
h = """<div class="teams" data-behavior="fix_broken_images"><a name="&lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&h=50" class="schedule-team-logo"></a></div><a name="&lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>"""
soup = BeautifulSoup(h)
print(soup.find("a", attrs={"class":"team-name"}).abbr["title"])
This will get all the names from the site:
from bs4 import BeautifulSoup
import requests
url = "http://espn.go.com/mlb/schedule"
soup = BeautifulSoup(requests.get(url).content)
table = soup.select_one("table.schedule.has-team-logos")
print([a["title"] for a in table.select("a.team-name abbr")])
Output:
['Detroit Tigers', 'Washington Nationals', 'Kansas City Royals', 'New York Yankees', 'Oakland Athletics', 'Boston Red Sox', 'Pittsburgh Pirates', 'Cincinnati Reds', 'Milwaukee Brewers', 'Miami Marlins', 'Chicago White Sox', 'Texas Rangers', 'San Diego Padres', 'Chicago Cubs', 'Baltimore Orioles', 'Minnesota Twins', 'Cleveland Indians', 'Houston Astros', 'Arizona Diamondbacks', 'Colorado Rockies', 'Tampa Bay Rays', 'Seattle Mariners', 'New York Mets', 'Los Angeles Dodgers', 'Toronto Blue Jays', 'San Francisco Giants']
Upvotes: 3