Beautifulsoup navigating divs by attribute without findAll

Question

How do I find a specific div by calling the attributes of a soup? i.e. something like soup.html.body.div however I don't see how to get the specific div with id='idname' here?

I can do soup.findAll(id='idname')[0] to get the specific tag, but as I understand it this is searching the whole soup.

I imagine getting the div by attribute on the soup would be faster since you are not using findAll()?

Firebug reports the location as being html.body.div[2].form.table[2].tbody.tr[3]... however doing soup.html.body.div[2] gives a key error.

Update:

Say you want to grab the I'm feeling lucky button from http://www.google.com, firebug reports that as being:

/html/body/center/span/center/div[2]/form/div[2]/div[3]/center/input[2]

Is there a way to reach this without using findAll?

LaC · Accepted Answer

The path you get from Firebug is an XPath expression. It's best to use a parser that lets you use xpath directly. I like using lxml with its etree interface:

from lxml import etree
tree = etree.parse(yourfile)
lucky = tree.xpath('/html/body/center/span/center/div[2]/form/div[2]/div[3]/center/input[2]')

Beautifulsoup navigating divs by attribute without findAll

Answers (2)

Related Questions