user3621633
user3621633

Reputation: 1721

How to wrap HTML content in CData (Java) for XSLT - XML to HTML

Struggling here to wrap HTML content in CData, using Java. The ultimate goal is transforming XML to HTML via XSLT. CData is a requirement. As such, I want the XSLT to ignore the HTML but I'm obviously doing something wrong since it's not preserving the HTML.

<?xml version="1.0" encoding="utf-8" ?>

<content>
    <records>
        <record>
            <param1>1</param1>
            <param2>25</param2>
            <param3>34</param3>
            <param4>b</param4>
            <param5>
                <p>this is html that should be wrapped with CData including the p tags.</p>
            </param5>
        </record>
    </records>
</content>

Here is the code:

DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse("test.xml");

doc.getDocumentElement().normalize();

Element param5 = (Element)doc.getElementsByTagName("param5").item(0);
CDATASection cdata = doc.createCDATASection(param5.getTextContent());
param5.appendChild(cdata);

DOMResult domResult = new DOMResult();

transform.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "param5");
transform.transform(new DOMSource(doc) , domResult);

So, for param5, the XML file, just before transformation resembles this:

<param5> 
    <![CDATA[
        this is html that should be wrapped with CData including the p tags.
    ]]>
</param5>

When I want

<param5> 
    <![CDATA[
        <p>this is html that should be wrapped with CData including the p tags.</p>
    ]]>
</param5>

I am lost as to what I'm doing wrong here.

Any help would be most appreciated. Thank you.

The XSL is very simple:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <html>
            <body>
                <h1><xsl:value-of select="content/records/record/param5"/></h1>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

Here is the sample HTML output that I need:

<html>
    <body>
        <h1>
            <p>this is html that should be wrapped with CData including the p tags.</p>
        </h1>
    </body>
</html>

I'm trying not to over complicate things. The basic problem is I want CData to include both the HTML content and the HTML tags. getTextContent() ignores the p tags. If there was a method that can grab everything inside param5, I'd be set.

Upvotes: 0

Views: 3021

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167516

If you want to create a CDATA section with the markup of DOM nodes then you first need to serialize those nodes which can be done in Java either using a default transformer or the DOM Load/Save API. So I would create a document fragment node and appendChild all child nodes of the param to the document fragment, the serialize the document fragment to a string then you can use your code to create a CDATA section and appendChild it to the param.

Here is a simple example, the imports needed are

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.DocumentFragment;


import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;

then the code to read in the document and find the element is as you posted and the DocumentFragment is used to assemble all child nodes removed from the element:

        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        docFactory.setNamespaceAware(true);

        DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

        Document doc = docBuilder.parse("sample1.xml");

        DocumentFragment frag1 = doc.createDocumentFragment();

        Element param = (Element)doc.getElementsByTagName("param5").item(0);

        while (param.hasChildNodes())
        {
            frag1.appendChild(param.getFirstChild());
        }

then the LSSerializer has a writeToString method:

        DOMImplementationLS lsImp = (DOMImplementationLS)doc.getImplementation();

        LSSerializer ser = lsImp.createLSSerializer();
        ser.getDomConfig().setParameter("xml-declaration", false);

        String xml = ser.writeToString(frag1);

        System.out.println(xml);

        param.appendChild(doc.createCDATASection(xml));

        System.out.println(ser.writeToString(doc));

The document then looks like

<content>
    <records>
        <record>
            <param1>1</param1>
            <param2>25</param2>
            <param3>34</param3>
            <param4>b</param4>
            <param5><![CDATA[
                <p>this is html that should be wrapped with CData including the p tags.</p>
            ]]></param5>
        </record>
    </records>
</content>

Someone at home in the Java world needs to tell you whether the cast to DOMImplementationLS lsImp = (DOMImplementationLS)doc.getImplementation(); is something reliable or whether you need to use the registry, as shown in http://www.java2s.com/Tutorial/Java/0440__XML/GeneratesaDOMfromscratchWritestheDOMtoaStringusinganLSSerializer.htm.

Upvotes: 4

Related Questions