theta
theta

Reputation: 25591

lxml XSLT removes CDATA while processing XML

Handling CDATA with lxml involves making parser with suitable declaration, but how about XSLT? For example:

from lxml import etree

parser = etree.XMLParser(strip_cdata=False)
tree = etree.parse('sample_with_cdata.xml', parser)
transform = etree.XSLT(etree.parse('dupe.xsl'))
xml_out = transform(tree)
xml_out.write('processed.xml')

If I process xml file with CDATA through lxml XSLT processor, all CDATA is stripped. How can I tell XSLT processor to leave CDATA as is?

PS. FYI, adding same parser to etree.XSLT doesn't change outcome

Upvotes: 0

Views: 839

Answers (2)

Michael Kay
Michael Kay

Reputation: 163262

As far as XSLT is concerned, CDATA sections in XML are just noise. XSLT treats <![CDATA["]]> the same as &quot; which it treats the same as "; they are different ways for the document author to write the same thing.

If you are using CDATA sections in your input to convey information, that is if <![CDATA[xxx]]> means something different from xxx, then you need to change your XML design.

Upvotes: 1

theta
theta

Reputation: 25591

This doesn't seem to be related to lxml. It's my lack of knowledge...

CDATA in XSLT should be handled with "cdata-section-elements" attribute in output declaration. For example, if description element in XML file contains CDATA:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" cdata-section-elements='description' />
...

Upvotes: 1

Related Questions