e3e4s
e3e4s

Reputation: 41

How to scrape for specific tables and specific rows/cells of data python

So this is my first python project and my goal is to scrape the final score from last night's Mets game and send it to a friend through twilio, but right now I'm having issues with extracting the scores from this website:

http://scores.nbcsports.com/mlb/scoreboard.asp?day=20160621&meta=true

The scraper below works but it obviously finds all the tables/rows/cells rather than the one I want. When I look at the html code for the each table, they're all the same:

<table class="shsTable shsLinescore" cellspacing="0">

My question is how can I scrape a specific table if the class attribute for all the games are the same?

from bs4 import BeautifulSoup
import urllib
import urllib.request

def make_soup(url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata

playerdatasaved =""


soup = make_soup("http://scores.nbcsports.com/mlb/scoreboard.asp?     day=20160621&meta=true")

for row in soup.findAll('tr'): #finds all rows
    playerdata=""
    for data in row.findAll('td'):
        playerdata = playerdata+","+data.text
    playerdatasaved =playerdatasaved+"\n" +playerdata[1:]

print(playerdatasaved)

Upvotes: 3

Views: 3813

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

Use the team name which is in the text of the anchors with the teamName class, find that then pull the previous table:

from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup(requests.get("http://scores.nbcsports.com/mlb/scoreboard.asp?day=20160621&meta=true").content, "lxml")

table = soup.find("a",class_="teamName", text="NY Mets").find_previous("table")
for row in table.find_all("tr"):
    print(row.find_all("td"))

Which gives you:

[<td style="text-align: left">Final</td>, <td class="shsTotD">1</td>, <td class="shsTotD">2</td>, <td class="shsTotD">3</td>, <td class="shsLinescoreSpacer">\xa0</td>, <td class="shsTotD">4</td>, <td class="shsTotD">5</td>, <td class="shsTotD">6</td>, <td class="shsLinescoreSpacer">\xa0</td>, <td class="shsTotD">7</td>, <td class="shsTotD">8</td>, <td class="shsTotD">9</td>, <td class="shsLinescoreSpacer">\xa0</td>, <td class="shsTotD">R</td>, <td class="shsTotD">H</td>, <td class="shsTotD">E</td>]
[<td class="shsNamD" nowrap=""><span class="shsLogo"><span class="shsMLBteam7sm_trans"></span></span><a class="teamName" href="/mlb/teamstats.asp?team=07&amp;type=teamhome">Kansas City</a></td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">0</td>, <td class="shsTotD">1</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">1</td>, <td class="shsTotD">7</td>, <td class="shsTotD">0</td>]
[<td class="shsNamD" nowrap=""><span class="shsLogo"><span class="shsMLBteam21sm_trans"></span></span><a class="teamName" href="/mlb/teamstats.asp?team=21&amp;type=teamhome">NY Mets</a></td>, <td class="shsTotD">1</td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">1</td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td></td>, <td class="shsTotD">0</td>, <td class="shsTotD">0</td>, <td class="shsTotD">x</td>, <td></td>, <td class="shsTotD">2</td>, <td class="shsTotD">6</td>, <td class="shsTotD">1</td>]

To get the score data:

from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup(requests.get("http://scores.nbcsports.com/mlb/scoreboard.asp?day=20160621&meta=true").content, "lxml")

table = soup.find("a",class_="teamName", text="NY Mets").find_previous("table")

a, b = [a.text for a in table.find_all("a",class_="teamName")]

inn, a_score, b_score = ([td.text for td in row.select("td.shsTotD")] 


print " ".join(inn)
print "{}: {}".format(a, " ".join(a_score))
print "{}: {}".format(b, " ".join(b_score))

Which gives you:

1 2 3 4 5 6 7 8 9 R H E
Kansas City: 0 0 0 0 1 0 0 0 0 1 7 0
NY Mets: 1 0 0 1 0 0 0 0 x 2 6 1

Upvotes: 2

Related Questions