CODEWITHSUNDEEP

Reputation: 279

getting all child nodes' values of the current node

I am trying to retrieve all of the values in the div. For example:

<div>xyz <span> abc </span> def</div>

This is the code

the_page="<div>xyz <span> abc </span> def</div>"
doc = libxml2dom.parseString(the_page, html=1)
divs=doc.getElementsByTagName("div")
print divs[0].firstChild.nodeValue

This only prints "xyz". I tried to just do print divs[0].nodeValue, but that gives me an error. I want all of the text. How would I get around this?

Upvotes: 1

Views: 3158

Answers (1)

Reputation: 3673

for your:

divs=doc.getElementsByTagName("div")

use:

childs = divs[0].childNodes

then, you can crawl them. Each child contains a list of childs and nodeValue

for child in childs :
    if child.childNode == []:
        print child.nodeValue
    else :
        ## Recurse

Upvotes: 1

Related Questions