Reputation: 79
How do I find a specific div by calling the attributes of a soup? i.e. something like soup.html.body.div
however I don't see how to get the specific div with id='idname'
here?
I can do soup.findAll(id='idname')[0]
to get the specific tag, but as I understand it this is searching the whole soup.
I imagine getting the div by attribute on the soup would be faster since you are not using findAll()
?
Firebug reports the location as being html.body.div[2].form.table[2].tbody.tr[3]...
however doing soup.html.body.div[2]
gives a key error.
Update:
Say you want to grab the I'm feeling lucky button from http://www.google.com, firebug reports that as being:
/html/body/center/span/center/div[2]/form/div[2]/div[3]/center/input[2]
Is there a way to reach this without using findAll
?
Upvotes: 4
Views: 1560
Reputation: 120688
There is a findChildren
method which gets most of the way there.
It's equivalent to:
findAll(tagname, recursive=False)
which will usually make it much more efficient.
So your example would become:
soup.html.body.center.span.center.findChildren('div')[2].\
form.findChildren('div')[2].findChildren('div')[3].\
center.findChildren('input')[2]
Upvotes: 1
Reputation: 12824
The path you get from Firebug is an XPath expression. It's best to use a parser that lets you use xpath directly. I like using lxml
with its etree
interface:
from lxml import etree
tree = etree.parse(yourfile)
lucky = tree.xpath('/html/body/center/span/center/div[2]/form/div[2]/div[3]/center/input[2]')
Upvotes: 3