user2351602
user2351602

Reputation: 93

How do I get the content between two xml tags in python?

import xml.dom.minidom

water = """
<channel>
<item>
<title>water</title>
<link>http://www.water.com</link>
</item>
<item>
<title>fire</title>
<link>http://www.fire.com</link>
</item>
</channel>"""

dom=xml.dom.minidom.parseString(water)
linklist = dom.getElementsByTagName('link')
print (len(linklist))

Using minidom, I want to get the content between link and /link as a string. Please let me know how to.

Upvotes: 2

Views: 1743

Answers (2)

b10hazard
b10hazard

Reputation: 7809

If you want to stick with xml.dom.minidom just call .firstChild.nodeValue. For example, you stored the links in the variable "linklist", so to print them simply iterate through them and call .firstChild.nodeValue, like this...

for link in linklist:
    print link.firstChild.nodeValue

prints...

http://www.water.com
http://www.fire.com

More detailed answer here.... Get Element value with minidom with Python


In response to your other question:
If you wanted to get a specific element you would need to know either where it is in the document or search for it.

For example, if you knew the link you wanted was the second link in the xml document you would do...

# the variable fire_link is a DOM Element of the second link in the xml file
fire_link = linklist[1]

However, if you wanted the link but do not know where it is in the document, you would have to search for it. Here is an example...

# fire_link is a list where each element is a DOM Element containing the http://www.fire.com link
fire_links = [l for l in linklist if l.firstChild.nodeValue == 'http://www.fire.com']

# take the first element
fire_link = fire_links[0]

Upvotes: 2

Aaron Digulla
Aaron Digulla

Reputation: 328594

This is more complicated than it looks. From the examples in the documentation, append this to the code in your question:

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)

text = getText(linklist[0].childNodes)
print text

I suggest to try the elementtree module where the code would be:

print linklist[0].text

Upvotes: 1

Related Questions