Reputation: 690
I have strings that have empty xml elements in them, like this:
>>> s = """fizz buzz <pb n="44"/> bananas"""
These strings have been assigned to xml elements using the etree.SubElement
method:
>>> from lxml import etree as et
>>> root = et.Element('root')
>>> txt = et.SubElement(root, 'text')
>>> txt.text = s
>>> et.dump(root)
<root>
<text>fizz buzz <pb n="44"/> bananas</text>
</root>
Fiddling about with re.split()
and etree's text
and tail
I can insert a subelement <pb n="44"/>
where I want it in txt.text
; however, sometimes I've got multiple occurrences of the <pb/>
element in the string, which complicates matters:
>>> s1 = """foo bar <pb n="42"/> parrots like <pb n="43"/> eggs and spam"""
Is there a straightforward way to insert such elements where they belong in an existing element's text
without fiddling around too much with text
and tail
?
Upvotes: 2
Views: 3267
Reputation: 50947
You could make your input string a well-formed XML document (with text
as the root element) and parse that into an Element object using fromstring()
. Then append it to the parent.
from lxml import etree as et
s1 = """foo bar <pb n="42"/> parrots like <pb n="43"/> eggs and spam"""
s2 = "<text>{0}</text>".format(s1)
text = et.fromstring(s2)
root = et.Element('root')
root.append(text)
et.dump(root)
Output:
<root>
<text>foo bar <pb n="42"/> parrots like <pb n="43"/> eggs and spam</text>
</root>
Upvotes: 4