radai
radai

Reputation: 24212

how to prevent XML Transformer from changing line endings

I have a method that edits an xml file. the general outline of the method is:

public void process(Path anXmlFile) {
    try {
        anXmlFile= anXmlFile.normalize();
        log.debug("processing {}",anXmlFile);
        Document dom = buildDOM(anXmlFile.toFile());

        //do stuff with dom...
        //delete original file
        //and finally ...
        dom.normalize(); //so we get a more predictable order

        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING,"UTF-8");
        transformer.setOutputProperty(OutputKeys.INDENT,"yes");
        Source source = new DOMSource(dom);
        Result result = new StreamResult(anXmlFile.toFile());
        transformer.transform(source, result);
    } catch (Exception e) {
        throw new IllegalStateException(e);
    }
}

my problem is that if i have a multi-line comment on the xml that opens in a certain line and closes in a following line (note the line break characters):

<!-- this is a long comment[cr][lf] 
     that spans 2 lines -->

than after I write out the modified DOM the result would be:

<!-- this is a long comment[cr] 
     that spans 2 lines -->

the problem is that [cr][lf] turned into [cr]. this is the only part of the xml affected in this way. all other line endings are the same as the original ([cr][lf]) - even those i've modified (my code doesnt change the comment nodes in the DOM).

Is there any configuration option I can give to the Transformer I create to avoid this? this is all done using JDK classes, no xml libraries involved.

Upvotes: 7

Views: 3169

Answers (1)

forty-two
forty-two

Reputation: 12817

The XML specification puts a requirement on XML processors (parsers) to replace \r\n or just \r with a single \n. So if you inspect your DOM text nodes, you will see that you only have \n as line endings.

When serializing the DOM tree, most implementations use the platform default when writing line breaks that occur in character data, or they give you an option to explicitly set the end-of-line string. However, comment text is not character data; the characters are just written as they are without any other processing. At least, this is how most serializers behave.

If it is terribly important, you could switch to JDOM and extend the AbstractXMLOutputProcessor to change the way comments a written.

Upvotes: 3

Related Questions