Using Python and BeautifulSoup to Parse a Table

Question

I am trying to access content in certain td tags with Python and BeautifulSoup. I can either get the first td tag meeting the criteria (with find), or all of them (with findAll).

Now, I could just use findAll, get them all, and get the content I want out of them, but that seems like it is inefficient (even if I put limits on the search). Is there anyway to go to a certain td tag meeting the criteria I want? Say the third, or the 10th?

Here's my code so far:

from __future__ import division
from __future__ import unicode_literals
from __future__ import print_function
from mechanize import Browser
from BeautifulSoup import BeautifulSoup

br = Browser()
url = "http://finance.yahoo.com/q/ks?s=goog+Key+Statistics"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
td = soup.findAll("td", {'class': 'yfnc_tablehead1'})

for x in range(len(td)):
    var1 = td[x]
    var2 = var1.contents[0]
    print(var2)

cerberos · Accepted Answer

find and findAll are very flexible, the BeautifulSoup.findAll docs say

5. You can pass in a callable object which takes a Tag object as its only argument, and returns a boolean. Every Tag object that findAll encounters will be passed into this object, and if the call returns True then the tag is considered to match.

Using Python and BeautifulSoup to Parse a Table

Answers (2)

Related Questions