Reputation: 15100
When creating an XML file with Python's etree, if we write to the file an empty tag using SubElement
, I get:
<MyTag />
Unfortunately, our XML parser library used in Fortran doesn't handle this even though it's a correct tag. It needs to see:
<MyTag></MyTag>
Is there a way to change the formatting rules or something in etree to make this work?
Upvotes: 18
Views: 20678
Reputation: 5857
If you have python >=3.4, use the short_empty_elements=False
option as has been shown in other answers already, but:
Then this works:
xml = "<foo/><bar/>"
xml = re.sub(r'<([^\/]+)\/\>', r'<\1></\1>', xml)
print(xml)
# output will be
# <foo></foo><bar></bar>
Upvotes: 0
Reputation: 1121734
As of Python 3.4, you can use the short_empty_elements
argument for both the tostring()
function and the ElementTRee.write()
method:
>>> from xml.etree import ElementTree as ET
>>> ET.tostring(ET.fromstring('<mytag/>'), short_empty_elements=False)
b'<mytag></mytag>'
In older Python versions, (2.7 through to 3.3), as a work-around you can use the html
method to write out the document:
>>> from xml.etree import ElementTree as ET
>>> ET.tostring(ET.fromstring('<mytag/>'), method='html')
'<mytag></mytag>'
Both the ElementTree.write()
method and the tostring()
function support the method
keyword argument.
On even earlier versions of Python (2.6 and before) you can install the external ElementTree library; version 1.3 supports that keyword.
Yes, it sounds a little weird, but the html
output mostly outputs empty elements as a start and end tag. Some elements still end up as empty tag elements; specifically <link/>
, <input/>
, <br/>
and such. Still, it's that or upgrade your Fortran XML parser to actually parse standards-compliant XML!
Upvotes: 19
Reputation: 1326
This was directly solved in Python 3.4. From then, the write
method of xml.etree.ElementTree.ElementTree
has the short_empty_elements
parameter which:
controls the formatting of elements that contain no content. If True (the default), they are emitted as a single self-closed tag, otherwise they are emitted as a pair of start/end tags.
More details in the xml.etree documentation.
Upvotes: 6
Reputation: 7837
Paraphrasing the code, the version of ElementTree.py
I use contains the following in a _write
method:
write('<' + tagname)
...
if node.text or len(node): # this line is literal
write('>')
...
write('</%s>' % tagname)
else:
write(' />')
To steer the program counter I created the following:
class AlwaysTrueString(str):
def __nonzero__(self): return True
true_empty_string = AlwaysTrueString()
Then I set node.text = true_empty_string
on those ElementTree nodes where I want an open-close tag rather than a self-closing one.
By "steering the program counter" I mean constructing a set of inputs—in this case an object with a somewhat curious truth test—to a library method such that the invocation of the library method traverses its control flow graph the way I want it to. This is ridiculously brittle: in a new version of the library, my hack might break—and you should probably treat "might" as "almost guaranteed". In general, don't break abstraction barriers. It just worked for me here.
Upvotes: 0
Reputation: 223
Adding an empty text
is another option:
etree.SubElement(parent, 'child_tag_name').text=''
But note that this will change not only the representation but also the structure of the document: i.e. child_el.text
will be ''
instead of None
.
Oh, and like Martijn said, try to use better libraries.
Upvotes: 3
Reputation: 21
If you have sed available, you could pipe the output of your python script to
sed -e "s/<\([^>]*\) \/>/<\1><\/\1>/g"
Which will find any occurence of <Tag />
and replace it by <Tag></Tag>
Upvotes: 2