Reputation: 41
So this is my first python project and my goal is to scrape the final score from last night's Mets game and send it to a friend through twilio, but right now I'm having issues with extracting the scores from this website:
http://scores.nbcsports.com/mlb/scoreboard.asp?day=20160621&meta=true
The scraper below works but it obviously finds all the tables/rows/cells rather than the one I want. When I look at the html code for the each table, they're all the same:
<table class="shsTable shsLinescore" cellspacing="0">
My question is how can I scrape a specific table if the class attribute for all the games are the same?
from bs4 import BeautifulSoup
import urllib
import urllib.request
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
playerdatasaved =""
soup = make_soup("http://scores.nbcsports.com/mlb/scoreboard.asp? day=20160621&meta=true")
for row in soup.findAll('tr'): #finds all rows
playerdata=""
for data in row.findAll('td'):
playerdata = playerdata+","+data.text
playerdatasaved =playerdatasaved+"\n" +playerdata[1:]
print(playerdatasaved)
Upvotes: 3
Views: 3813
Reputation: 180441
Use the team name which is in the text of the anchors with the teamName
class, find that then pull the previous table:
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup(requests.get("http://scores.nbcsports.com/mlb/scoreboard.asp?day=20160621&meta=true").content, "lxml")
table = soup.find("a",class_="teamName", text="NY Mets").find_previous("table")
for row in table.find_all("tr"):
print(row.find_all("td"))
Which gives you:
[<td style="text-align: left">Final</td>, <td class="shsTotD">1</td>, <td class="shsTotD">2</td>, <td class="shsTotD">3</td>, <td class="shsLinescoreSpacer">\xa0</td>, <td class="shsTotD">4</td>, <td class="shsTotD">5</td>, <td class="shsTotD">6</td>, <td class="shsLinescoreSpacer">\xa0</td>, <td class="shsTotD">7</td>, <td class="shsTotD">8</td>, <td class="shsTotD">9</td>, <td class="shsLinescoreSpacer">\xa0</td>, <td class="shsTotD">R</td>, <td class="shsTotD">H</td>, <td class="shsTotD">E</td>]
[<td class="shsNamD" nowrap=""><span class="shsLogo"><span class="shsMLBteam7sm_trans"></span></span><a class="teamName" href="/mlb/teamstats.asp?team=07&type=teamhome">Kansas City</a></td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">0</td>, <td class="shsTotD">1</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">1</td>, <td class="shsTotD">7</td>, <td class="shsTotD">0</td>]
[<td class="shsNamD" nowrap=""><span class="shsLogo"><span class="shsMLBteam21sm_trans"></span></span><a class="teamName" href="/mlb/teamstats.asp?team=21&type=teamhome">NY Mets</a></td>, <td class="shsTotD">1</td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">1</td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td class="shsTotD">x</td>, <td></td>, <td class="shsTotD">2</td>, <td class="shsTotD">6</td>, <td class="shsTotD">1</td>]
To get the score data:
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup(requests.get("http://scores.nbcsports.com/mlb/scoreboard.asp?day=20160621&meta=true").content, "lxml")
table = soup.find("a",class_="teamName", text="NY Mets").find_previous("table")
a, b = [a.text for a in table.find_all("a",class_="teamName")]
inn, a_score, b_score = ([td.text for td in row.select("td.shsTotD")]
print " ".join(inn)
print "{}: {}".format(a, " ".join(a_score))
print "{}: {}".format(b, " ".join(b_score))
Which gives you:
1 2 3 4 5 6 7 8 9 R H E
Kansas City: 0 0 0 0 1 0 0 0 0 1 7 0
NY Mets: 1 0 0 1 0 0 0 0 x 2 6 1
Upvotes: 2