jcmcdonald
jcmcdonald

Reputation: 327

Python Beautiful Soup Table Data Scraping Specific TD Tags

This webpage has multiple tables on it: http://www.nfl.com/player/tombrady/2504211/gamelogs .

Within the HTML all of the tables are labeled the exact same:

<table class="data-table1" width="100%" border="0" summary="Game Logs For Tom Brady In 2014">

I can scrape data from only the first table (Preseason table) but I do not know how to skip the first table (Preseason) and scrape data from the second and third tables (Regular Season and Post Season).

I'm trying to scrape specific numbers.

My code:

import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen

year = '2014'
lastWeek = '2'
favQB1 = "Tom Brady"

favQBurl2 = 'http://www.nfl.com/player/tombrady/2504211/gamelogs'
favQBhtml2 = urlopen(favQBurl2).read()
favQBsoup2 = BeautifulSoup(favQBhtml2)
favQBpass2 = favQBsoup2.find("table", { "summary" : "Game Logs For %s In %s" % (favQB1, year)})
favQBrows2 = []

for row in favQBpass2.findAll("tr"):
    if lastWeek in row.findNext('td'):  
        for item in row.findAll("td"):
            favQBrows2.append(item.text)
print ("Enter: Starting Quarterback QB Rating of Favored Team for the last game played (regular season): "),
print favQBrows2[15]

Upvotes: 2

Views: 2210

Answers (2)

Vikas Ojha
Vikas Ojha

Reputation: 6950

Following should work as well -

import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen

year = '2014'
lastWeek = '2'
favQB1 = "Tom Brady"

favQBurl2 = 'http://www.nfl.com/player/tombrady/2504211/gamelogs'
favQBhtml2 = urlopen(favQBurl2).read()
favQBsoup2 = BeautifulSoup(favQBhtml2)
favQBpass2 = favQBsoup2.find_all("table", { "summary" : "Game Logs For %s In %s" % (favQB1, year)})[1]
favQBrows2 = []

for row in favQBpass2.findAll("tr"):
    if lastWeek in row.findNext('td'):
        for item in row.findAll("td"):
            favQBrows2.append(item.text)
print ("Enter: Starting Quarterback QB Rating of Favored Team for the last game played (regular season): "),
print favQBrows2[15]

Upvotes: 1

alecxe
alecxe

Reputation: 473853

Rely on the table title, which is located in the td element in the first table row:

def find_table(soup, label):
    return soup.find("td", text=label).find_parent("table", summary=True)

Usage:

find_table(soup, "Preseason")
find_table(soup, "Regular Season")
find_table(soup, "Postseason")

FYI, find_parent() documentation reference.

Upvotes: 2

Related Questions