x-yuri
x-yuri

Reputation: 18823

Get element's text with CDATA

Say, I have an element:

>>> el = etree.XML('<tag><![CDATA[content]]></tag>')
>>> el.text
'content'

What I'd like to get is <![CDATA[content]]>. How can I go about it?

Upvotes: 1

Views: 1286

Answers (2)

Daniel Haley
Daniel Haley

Reputation: 52848

When you do el.text, that's always going to give you the plain text content.

To see the serialized element try tostring() instead:

el = etree.XML('<tag><![CDATA[content]]></tag>')
print(etree.tostring(el).decode())

this will print:

<tag>content</tag>

To preserve the CDATA, you need to use an XMLParser() with strip_cdata=False:

parser = etree.XMLParser(strip_cdata=False)

el = etree.XML('<tag><![CDATA[content]]></tag>', parser=parser)
print(etree.tostring(el).decode())

This will print:

<tag><![CDATA[content]]></tag>

This should be sufficient to fulfill your "I want to make sure in a test that content is wrapped in CDATA" requirement.

Upvotes: 3

Sebastien D
Sebastien D

Reputation: 4482

You might consider using BeautifulSoup and look for CDATA instances:

import bs4
from bs4 import BeautifulSoup

data='''<tag><![CDATA[content]]></tag>'''
soup = BeautifulSoup(data, 'html.parser')
"<![CDATA[{}]]>".format(soup.find(text=lambda x: isinstance(x, bs4.CData)))

Output

<![CDATA[content]]>

Upvotes: 2

Related Questions