Reputation: 6735
I'm puzzled by minidom parser handling of empty element, as shown in following code section.
import xml.dom.minidom
doc = xml.dom.minidom.parseString('<value></value>')
print doc.firstChild.nodeValue.__repr__()
# Out: None
print doc.firstChild.toxml()
# Out: <value/>
doc = xml.dom.minidom.Document()
v = doc.appendChild(doc.createElement('value'))
v.appendChild(doc.createTextNode(''))
print v.firstChild.nodeValue.__repr__()
# Out: ''
print doc.firstChild.toxml()
# Out: <value></value>
How can I get consistent behavior? I'd like to receive empty string as value of empty element (which IS what I put in XML structure in the first place).
Upvotes: 3
Views: 7582
Reputation: 10228
Cracking open xml.dom.minidom and searching for "/>", we find this:
# Method of the Element(Node) class.
def writexml(self, writer, indent="", addindent="", newl=""):
# [snip]
if self.childNodes:
writer.write(">%s"%(newl))
for node in self.childNodes:
node.writexml(writer,indent+addindent,addindent,newl)
writer.write("%s</%s>%s" % (indent,self.tagName,newl))
else:
writer.write("/>%s"%(newl))
We can deduce from this that the short-end-tag form only occurs when childNodes is an empty list. Indeed, this seems to be true:
>>> doc = Document()
>>> v = doc.appendChild(doc.createElement('v'))
>>> v.toxml()
'<v/>'
>>> v.childNodes
[]
>>> v.appendChild(doc.createTextNode(''))
<DOM Text node "''">
>>> v.childNodes
[<DOM Text node "''">]
>>> v.toxml()
'<v></v>'
As pointed out by Lloyd, the XML spec makes no distinction between the two. If your code does make the distinction, that means you need to rethink how you want to serialize your data.
xml.dom.minidom simply displays something differently because it's easier to code. You can, however, get consistent output. Simply inherit the Element
class and override the toxml
method such that it will print out the short-end-tag form when there are no child nodes with non-empty text content. Then monkeypatch the module to use your new Element class.
Upvotes: 4