tpg2114
tpg2114

Reputation: 15100

Python etree control empty tag format

When creating an XML file with Python's etree, if we write to the file an empty tag using SubElement, I get:

<MyTag />

Unfortunately, our XML parser library used in Fortran doesn't handle this even though it's a correct tag. It needs to see:

<MyTag></MyTag>

Is there a way to change the formatting rules or something in etree to make this work?

Upvotes: 18

Views: 20678

Answers (6)

little_birdie
little_birdie

Reputation: 5857

If you have python >=3.4, use the short_empty_elements=Falseoption as has been shown in other answers already, but:

  1. If you have the XML in string form already and can't touch the code where it's generated..
  2. If you're in a situation where you are stuck with python <3.4..
  3. If you're using a different XML library that insists on self-closing tags..

Then this works:

 xml = "<foo/><bar/>"
 xml = re.sub(r'<([^\/]+)\/\>', r'<\1></\1>', xml)

 print(xml)

 # output will be
 # <foo></foo><bar></bar>

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1121734

As of Python 3.4, you can use the short_empty_elements argument for both the tostring() function and the ElementTRee.write() method:

>>> from xml.etree import ElementTree as ET
>>> ET.tostring(ET.fromstring('<mytag/>'), short_empty_elements=False)
b'<mytag></mytag>'

In older Python versions, (2.7 through to 3.3), as a work-around you can use the html method to write out the document:

>>> from xml.etree import ElementTree as ET
>>> ET.tostring(ET.fromstring('<mytag/>'), method='html')
'<mytag></mytag>'

Both the ElementTree.write() method and the tostring() function support the method keyword argument.

On even earlier versions of Python (2.6 and before) you can install the external ElementTree library; version 1.3 supports that keyword.

Yes, it sounds a little weird, but the html output mostly outputs empty elements as a start and end tag. Some elements still end up as empty tag elements; specifically <link/>, <input/>, <br/> and such. Still, it's that or upgrade your Fortran XML parser to actually parse standards-compliant XML!

Upvotes: 19

khyox
khyox

Reputation: 1326

This was directly solved in Python 3.4. From then, the write method of xml.etree.ElementTree.ElementTree has the short_empty_elements parameter which:

controls the formatting of elements that contain no content. If True (the default), they are emitted as a single self-closed tag, otherwise they are emitted as a pair of start/end tags.

More details in the xml.etree documentation.

Upvotes: 6

Jonas K&#246;lker
Jonas K&#246;lker

Reputation: 7837

Paraphrasing the code, the version of ElementTree.py I use contains the following in a _write method:

write('<' + tagname)
...
if node.text or len(node): # this line is literal
    write('>')
    ...
    write('</%s>' % tagname)
else:
    write(' />')

To steer the program counter I created the following:

class AlwaysTrueString(str):
    def __nonzero__(self): return True
true_empty_string = AlwaysTrueString()

Then I set node.text = true_empty_string on those ElementTree nodes where I want an open-close tag rather than a self-closing one.

By "steering the program counter" I mean constructing a set of inputs—in this case an object with a somewhat curious truth test—to a library method such that the invocation of the library method traverses its control flow graph the way I want it to. This is ridiculously brittle: in a new version of the library, my hack might break—and you should probably treat "might" as "almost guaranteed". In general, don't break abstraction barriers. It just worked for me here.

Upvotes: 0

skarap
skarap

Reputation: 223

Adding an empty text is another option:

etree.SubElement(parent, 'child_tag_name').text=''

But note that this will change not only the representation but also the structure of the document: i.e. child_el.text will be '' instead of None.

Oh, and like Martijn said, try to use better libraries.

Upvotes: 3

Maxime
Maxime

Reputation: 21

If you have sed available, you could pipe the output of your python script to

sed -e "s/<\([^>]*\) \/>/<\1><\/\1>/g"

Which will find any occurence of <Tag /> and replace it by <Tag></Tag>

Upvotes: 2

Related Questions