Reputation: 18823
Say, I have an element:
>>> el = etree.XML('<tag><![CDATA[content]]></tag>')
>>> el.text
'content'
What I'd like to get is <![CDATA[content]]>
. How can I go about it?
Upvotes: 1
Views: 1286
Reputation: 52848
When you do el.text
, that's always going to give you the plain text content
.
To see the serialized element try tostring()
instead:
el = etree.XML('<tag><![CDATA[content]]></tag>')
print(etree.tostring(el).decode())
this will print:
<tag>content</tag>
To preserve the CDATA, you need to use an XMLParser()
with strip_cdata=False
:
parser = etree.XMLParser(strip_cdata=False)
el = etree.XML('<tag><![CDATA[content]]></tag>', parser=parser)
print(etree.tostring(el).decode())
This will print:
<tag><![CDATA[content]]></tag>
This should be sufficient to fulfill your "I want to make sure in a test that content is wrapped in CDATA" requirement.
Upvotes: 3
Reputation: 4482
You might consider using BeautifulSoup and look for CDATA
instances:
import bs4
from bs4 import BeautifulSoup
data='''<tag><![CDATA[content]]></tag>'''
soup = BeautifulSoup(data, 'html.parser')
"<![CDATA[{}]]>".format(soup.find(text=lambda x: isinstance(x, bs4.CData)))
Output
<![CDATA[content]]>
Upvotes: 2