Josip
Josip

Reputation: 6735

Empty XML element handling in Python

I'm puzzled by minidom parser handling of empty element, as shown in following code section.

import xml.dom.minidom

doc = xml.dom.minidom.parseString('<value></value>')
print doc.firstChild.nodeValue.__repr__()
# Out: None
print doc.firstChild.toxml()
# Out: <value/>

doc = xml.dom.minidom.Document()
v = doc.appendChild(doc.createElement('value'))
v.appendChild(doc.createTextNode(''))
print v.firstChild.nodeValue.__repr__()
# Out: ''
print doc.firstChild.toxml()
# Out: <value></value>

How can I get consistent behavior? I'd like to receive empty string as value of empty element (which IS what I put in XML structure in the first place).

Upvotes: 3

Views: 7582

Answers (3)

hao
hao

Reputation: 10228

Cracking open xml.dom.minidom and searching for "/>", we find this:

# Method of the Element(Node) class.
def writexml(self, writer, indent="", addindent="", newl=""):
    # [snip]
    if self.childNodes:
        writer.write(">%s"%(newl))
        for node in self.childNodes:
            node.writexml(writer,indent+addindent,addindent,newl)
        writer.write("%s</%s>%s" % (indent,self.tagName,newl))
    else:
        writer.write("/>%s"%(newl))

We can deduce from this that the short-end-tag form only occurs when childNodes is an empty list. Indeed, this seems to be true:

>>> doc = Document()
>>> v = doc.appendChild(doc.createElement('v'))
>>> v.toxml()
'<v/>'
>>> v.childNodes
[]
>>> v.appendChild(doc.createTextNode(''))
<DOM Text node "''">
>>> v.childNodes
[<DOM Text node "''">]
>>> v.toxml()
'<v></v>'

As pointed out by Lloyd, the XML spec makes no distinction between the two. If your code does make the distinction, that means you need to rethink how you want to serialize your data.

xml.dom.minidom simply displays something differently because it's easier to code. You can, however, get consistent output. Simply inherit the Element class and override the toxml method such that it will print out the short-end-tag form when there are no child nodes with non-empty text content. Then monkeypatch the module to use your new Element class.

Upvotes: 4

Lloyd
Lloyd

Reputation: 1324

Xml spec does not distinguish these two cases.

Upvotes: 1

John Machin
John Machin

Reputation: 82934

value = thing.firstChild.nodeValue or ''

Upvotes: 1

Related Questions