Reputation: 279
I am trying to retrieve all of the values in the div. For example:
<div>xyz <span> abc </span> def</div>
This is the code
the_page="<div>xyz <span> abc </span> def</div>"
doc = libxml2dom.parseString(the_page, html=1)
divs=doc.getElementsByTagName("div")
print divs[0].firstChild.nodeValue
This only prints "xyz". I tried to just do print divs[0].nodeValue, but that gives me an error. I want all of the text. How would I get around this?
Upvotes: 1
Views: 3158
Reputation: 3673
for your:
divs=doc.getElementsByTagName("div")
use:
childs = divs[0].childNodes
then, you can crawl them. Each child contains a list of childs and nodeValue
for child in childs :
if child.childNode == []:
print child.nodeValue
else :
## Recurse
Upvotes: 1