Steven Matthews
Steven Matthews

Reputation: 11275

Using Python and BeautifulSoup to Parse a Table

I am trying to access content in certain td tags with Python and BeautifulSoup. I can either get the first td tag meeting the criteria (with find), or all of them (with findAll).

Now, I could just use findAll, get them all, and get the content I want out of them, but that seems like it is inefficient (even if I put limits on the search). Is there anyway to go to a certain td tag meeting the criteria I want? Say the third, or the 10th?

Here's my code so far:

from __future__ import division
from __future__ import unicode_literals
from __future__ import print_function
from mechanize import Browser
from BeautifulSoup import BeautifulSoup

br = Browser()
url = "http://finance.yahoo.com/q/ks?s=goog+Key+Statistics"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
td = soup.findAll("td", {'class': 'yfnc_tablehead1'})

for x in range(len(td)):
    var1 = td[x]
    var2 = var1.contents[0]
    print(var2)

Upvotes: 1

Views: 1238

Answers (2)

cerberos
cerberos

Reputation: 8035

find and findAll are very flexible, the BeautifulSoup.findAll docs say

5. You can pass in a callable object which takes a Tag object as its only argument, and returns a boolean. Every Tag object that findAll encounters will be passed into this object, and if the call returns True then the tag is considered to match.

Upvotes: 1

user2665694
user2665694

Reputation:

Is there anyway to go to a certain td tag meeting the criteria I want? Say the third, or the 10th?

Well...

all_tds = [td for td in soup.findAll("td", {'class': 'yfnc_tablehead1'})]

print all_tds[3]

...there is no other way..

Upvotes: 2

Related Questions