Reputation: 410
When using python's xml.etree module, how can I escape xml-special characters like '>' and '<' to be used inside a tag? Must I do so manually? Does etree have a method or kwarg that I am missing?
Consider:
In [1]: from xml.etree.ElementTree import Element, SubElement, tostring
In [2]: root = Element('filter')
In [3]: root.set('type', 'test')
In [4]: for op in ['<', '>', '=']:
...: sub_elem = SubElement(root, op)
...: child = Element('a')
...: child.text = 'b'
...: sub_elem.append(child)
...:
In [5]: tostring(root)
Out[5]: '<filter type="test"><<><a>b</a></<><>><a>b</a></>><=><a>b</a></=></filter>'
Where I would like to see sections like:
<<><a>b</a></<>
Upvotes: 1
Views: 4274
Reputation: 50947
Where I would like to see sections like:
<<><a>b</a></<>
This is not well-formed XML. I guess that you forgot the semicolons, but adding them does not help. The following is also ill-formed:
<<><a>b</a></<>
In the code, you are trying to create elements called <
, >
, and =
. That won't work. All of the following are forbidden in XML element names: <
, >
, =
, >
, <
.
Unfortunately, ElementTree is a bit lax and allows you to create pseudo-XML, such as this (from the question):
<filter type="test"><<><a>b</a></<><>><a>b</a></>><=><a>b</a></=></filter>
If you had used lxml.etree
(see http://lxml.de) instead of xml.etree.ElementTree
, you would have received an error message: "ValueError: Invalid tag name u'<'".
Upvotes: 1
Reputation: 2214
<
and >
are not valid characters in XML, and should instead be replaced with <
and >
respectively.
You can use a regular expression to replace the characters that are invalid:
import re
regexp = re.compile(r'<|>') # here we are making a regex to catch either the character '<' or '>'
replacement_map = {'<': '<', '>': '>'} # a dict to map a character to the replacement value.
regexp.sub(lambda match: replacement_map[match.group(0)], '<a>hello</a>') # do the replacement
# output: '<a>hello</a>'
Though the code a a little more involoved, it is a very efficient way of doing the replacements.
Upvotes: 2