user562427
user562427

Reputation: 79

Beautifulsoup navigating divs by attribute without findAll

How do I find a specific div by calling the attributes of a soup? i.e. something like soup.html.body.div however I don't see how to get the specific div with id='idname' here?

I can do soup.findAll(id='idname')[0] to get the specific tag, but as I understand it this is searching the whole soup.

I imagine getting the div by attribute on the soup would be faster since you are not using findAll()?

Firebug reports the location as being html.body.div[2].form.table[2].tbody.tr[3]... however doing soup.html.body.div[2] gives a key error.

Update:

Say you want to grab the I'm feeling lucky button from http://www.google.com, firebug reports that as being:

/html/body/center/span/center/div[2]/form/div[2]/div[3]/center/input[2]

Is there a way to reach this without using findAll?

Upvotes: 4

Views: 1560

Answers (2)

ekhumoro
ekhumoro

Reputation: 120688

There is a findChildren method which gets most of the way there.

It's equivalent to:

findAll(tagname, recursive=False)

which will usually make it much more efficient.

So your example would become:

soup.html.body.center.span.center.findChildren('div')[2].\
    form.findChildren('div')[2].findChildren('div')[3].\
    center.findChildren('input')[2]

Upvotes: 1

LaC
LaC

Reputation: 12824

The path you get from Firebug is an XPath expression. It's best to use a parser that lets you use xpath directly. I like using lxml with its etree interface:

from lxml import etree
tree = etree.parse(yourfile)
lucky = tree.xpath('/html/body/center/span/center/div[2]/form/div[2]/div[3]/center/input[2]')

Upvotes: 3

Related Questions