markijbema
markijbema

Reputation: 4055

How to prevent xml.ElementTree fromstring from dropping commentnode

I have tho following code fragment:

    from xml.etree.ElementTree import fromstring,tostring
    mathml = fromstring(input)
    for elem in mathml.getiterator():
        elem.tag = 'm:' + elem.tag
    return tostring(mathml)

When i input the following input:

<math>
  <a> 1 2 3 </a>  <b />
<foo>Uitleg</foo>
<!-- <bar> -->
</math>

It results in:

<m:math>
  <m:a> 1 2 3 </m:a>  <m:b />
<m:foo>Uitleg</m:foo>

</m:math>

How come? And how can I preserve the comment?

edit: I don't care for the exact xml library used, however, I should be able to do the pasted change to the tags. Unfortunately, lxml does not seem to allow this (and I cannot use proper namespace operations)

Upvotes: 15

Views: 7721

Answers (1)

Steven
Steven

Reputation: 28686

You cannot with xml.etree, because its parser ignores comments (which is acceptable behaviour for an xml parser by the way). But you can if you use the (compatible) lxml library, which allows you to configure parser options.

from lxml import etree

parser = etree.XMLParser(remove_comments=False)
tree = etree.parse('input.xml', parser=parser)
# or alternatively set the parser as default:
# etree.set_default_parser(parser)

This would by far be the easiest option. If you really have to use xml.etree, you could try hooking up your own parser, although even then, comments are not officially supported: have a look at this example (from the author of xml.etree) (still seems to work in python 2.7 by the way)

Upvotes: 17

Related Questions