java xml Output of an object, special characters are not being escaped correctly

Question

So for a project I have to write a class that takes multiple objects "Page" with parameters nameSpaceID, articleID, title, a string set of categories and then outputs them into an xml file. I tried to solve it by using an XMLOutputFactory with a XMLStreamWriter, to write the xml into StringWriter, then I transform the StringWriter with a transformerFactory to the right format (indent and stuff) and lastly output that into a .xml file. Everything works so far, but I need help with escaping of special characters, if i put a > for example in my fileName, it wont get escaped. I tried escaping it with StringEscapeUtils.escapeXml10(String) but that does only make my output worse.

import java.io.FileOutputStream;
import org.apache.commons.lang3.StringEscapeUtils;
import java.io.StringReader;
import java.io.StringWriter;
import java.util.HashSet;
import java.util.Set;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

/**
 * 
 */

/**
 * @author Paul
 *
 */
public class PageExport {
    /**
     * @param args
     */
    public void printPagestoXML(Page[] pages, String fileName, String filePath){
        try {
            StringWriter xmlRAW = new StringWriter();
            XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newFactory();
            xmlOutputFactory.setProperty("escapeCharacters", false);
            XMLStreamWriter xmlStreamWriter = xmlOutputFactory.createXMLStreamWriter(xmlRAW);

            xmlStreamWriter.writeStartDocument("UTF-8", "1.0");

            xmlStreamWriter.writeStartElement("pages");

            for(int i = 0; i < pages.length; i++){
                xmlStreamWriter.writeStartElement("page");
                xmlStreamWriter.writeAttribute("pageID", pages[i].getArticleID() + "");
                xmlStreamWriter.writeAttribute("namespaceID", pages[i].getNamespaceID() + "");
                xmlStreamWriter.writeAttribute("title", StringEscapeUtils.escapeXml10(pages[i].getTitle()));

                if (pages[i].getCategories() != null){
                    xmlStreamWriter.writeStartElement("categories");

                    for(int j = 0; j < pages[i].getCategories().size(); j++) {
                        xmlStreamWriter.writeEmptyElement("category");
                        xmlStreamWriter.writeAttribute("name", pages[i].getCategories().toArray()[j].toString());
                    }

                    xmlStreamWriter.writeEndElement(); //end of categories
                }

                xmlStreamWriter.writeEndElement(); //end of page i
            }
            xmlStreamWriter.writeEndElement(); //end of pages

            xmlStreamWriter.writeEndDocument(); // end of document

            xmlStreamWriter.flush();
            xmlStreamWriter.close();

            Transformer transformer = TransformerFactory.newInstance().newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            StreamResult streamResult = new StreamResult(new FileOutputStream(filePath + fileName));
            transformer.transform(new StreamSource(new StringReader(xmlRAW.getBuffer().toString())), streamResult);
        }
        catch (Exception e){
            System.out.println(e.getMessage());
        }
    }

    public static void main(String[] args) {
        String goodFilePath = System.getProperty("user.dir") + "/src/data/";
        String goodFileName = "test.xml";
        Set testCategories = new HashSet();
        testCategories.add("this");
        testCategories.add("is");
        testCategories.add("sparta");
        Page[] testPages = {new Page(0, 1337, "l33t", testCategories), new Page(0, 1338, "l33t>", testCategories)};
        PageExport pe = new PageExport();
        pe.printPagestoXML(testPages, goodFileName, goodFilePath);
    }

}

output of this code (the second pages title is the important one):

without StringEscapeUtils.escapeXml10(title) :

What I want:

EDIT: I fixed the issue by setting the DOCTYPE_PUBLIC to "yes", new code:

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.StringReader;
import java.io.StringWriter;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.apache.log4j.Logger;

/**
 * @author Paul
 *
 */

public class PageExport {

    Logger log = Logger.getLogger(PageExport.class);

    /**
     * Converts a collection of Pages into a XML String and then into a XML file.
     * 
     * @param   pages The collection or Pages, that shall be written into the file.
     * @param   filepath The full path of the XML file.
     * @see     #printPagestoXML(Page[], String, String)
     * @see     Page
     * 
     */

    public void printPagestoXML(Page[] pages, String filepath){
        //Converting a single input filepath into a filepath & filename and
        //then running the method with the arguments
        String newfilepath = "";
        String[] splitpath = filepath.split("/");
        for (int i = 0; i < splitpath.length - 1 ; i++){
            newfilepath += (splitpath[i] + "/");
        }
        printPagestoXML(pages,  newfilepath, splitpath[splitpath.length - 1].split("\.")[0]);
    }

    /**
     * Converts a collection of Pages into a XML String and then into a XML file.
     * 
     * @param   pages The collection or Pages, that shall be written into the file.
     * @param   filepath The path of the XML file.
     * @param   filename Name of the .xml file (Without .xml)
     * @see     #printPagestoXML(Page[], String, String)
     * @see     Page
     * 
     */

    public void printPagestoXML(Page[] pages, String filepath, String filename){

        try {
            //Method starts of by creating a new outputfactory, that prints to a StringWriter,
            //so that the xml String can still be transformed before getting output.
            StringWriter rawXml = new StringWriter();
            XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newFactory();
            XMLStreamWriter xmlStreamWriter = xmlOutputFactory.createXMLStreamWriter(rawXml);

            xmlStreamWriter.writeStartDocument("UTF-8", "1.0"); //start of the XML stream

            xmlStreamWriter.writeStartElement("pages"); //the first element "pages"

            for(int i = 0; i < pages.length; i++){  
                //loop to create elements for all pages in the collection
                log.info("Creating Page " + i + ": " + pages[i].getTitle());
                xmlStreamWriter.writeStartElement("page");
                xmlStreamWriter.writeAttribute("pageID", pages[i].getArticleID() + "");
                xmlStreamWriter.writeAttribute("namespaceID", pages[i].getNamespaceID() + "");
                xmlStreamWriter.writeAttribute("title", pages[i].getTitle());

                if (pages[i].getCategories() != null){  
                    xmlStreamWriter.writeStartElement("categories");

                    for(int j = 0; j < pages[i].getCategories().size(); j++) {  
                        //loop to create all categories for the currently creating page
                        log.trace("Creating Category " + j + ": " + pages[i].getCategories().toArray()[j].toString());
                        xmlStreamWriter.writeEmptyElement("category");
                        xmlStreamWriter.writeAttribute("name", pages[i].getCategories().toArray()[j].toString());
                    }

                    xmlStreamWriter.writeEndElement(); //end of categories
                }
                else {
                    // in case a page doesn't categories, the element wont be created and a warning is posted
                    log.info("Page " + (i + 1) + " does not have categories (" + pages[i].toString() + ")");
                }

                xmlStreamWriter.writeEndElement(); //end of page i
            }
            log.info("Last page written.");
            xmlStreamWriter.writeEndElement(); //end of pages
            xmlStreamWriter.writeEndDocument(); // end of document

            xmlStreamWriter.flush();
            xmlStreamWriter.close(); //close the streamwriter

            /*
             * The StringWriter variable rawXml now contains the whole XML string, but it still has to be
             * transformed, otherwise it would all be printed in one line.
             */
            Transformer transformer = TransformerFactory.newInstance().newTransformer();
            transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "yes");    //Setting the output properties
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");            //for the transformer
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            StreamResult streamResult = new StreamResult(new FileOutputStream(filepath + filename + ".xml"));

            //initiation of the output streamresult with the filepath
            transformer.transform(new StreamSource(new StringReader(rawXml.toString())), streamResult);

            log.info(filename + ".xml created.");
            //transformation / formatting of the xml string and output into .xml file
        } catch (Exception e){
            log.warn(e.getMessage());
        }
    }

zyexal · Accepted Answer

Please read about Character Data and Markup:

Ampersand character & and the left angle bracket < may appear in their literal form only when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings & and < respectively.

The right angle bracket > may be represented using the string >, and must, for compatibility, be escaped using > or a character reference when it appears in the string ]]> in content, when that string is not marking the end of a CDATA section.

Now it should be clear, why it's not working like you expected.

java xml Output of an object, special characters are not being escaped correctly

Answers (2)

Related Questions