EndlosSchleife
EndlosSchleife

Reputation: 597

javax.xml.transform.Transformer line endings no longer respect system property "line.separator"

A comment at How to control line endings that javax.xml.transform.Transformer creates? suggests setting system property "line.separator". This worked for me (and is acceptable for my task at hand) in Java 8 (Oracle JDK 1.8.0_171), but not in Java 11 (openjdk 11.0.1).

From ticket XALANJ-2137 I made an (uneducated, as I don't even know which javax.xml implementation I am using) guess to try setOutputProperty("{http://xml.apache.org/xslt}line-separator", ..) or maybe setOutputProperty("{http://xml.apache.org/xalan}line-separator", ..), but neither works.

How can I control the transformer's line breaks in Java 11?

Here's some demo code which prints "... #13 #10 ..." under Windows with Java 11, where it should print "... #10 ..." only.

package test.xml;

import java.io.StringReader;
import java.io.StringWriter;
import java.util.stream.Collectors;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;


public class TestXmlTransformerLineSeparator {
    public static void main(String[] args) throws Exception {
        String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><foo/></root>";
        final String lineSep = "\n";

        String oldLineSep = System.setProperty("line.separator", lineSep);
        try {
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
            transformer.setOutputProperty("{http://xml.apache.org/xalan}line-separator", lineSep);
            transformer.setOutputProperty("{http://xml.apache.org/xslt}line-separator", lineSep);

            StreamSource source = new StreamSource(new StringReader(xml));
            StringWriter writer = new StringWriter();
            StreamResult target = new StreamResult(writer);

            transformer.transform(source, target);

            System.out.println(writer.toString().chars().mapToObj(c -> c <= ' ' ? "#" + c : "" + Character.valueOf((char) c))
                    .collect(Collectors.joining(" ")));
            System.out.println(writer);
        } finally {
            System.setProperty("line.separator", oldLineSep);
        }
    }
}

Upvotes: 3

Views: 3116

Answers (1)

Evan VanderZee
Evan VanderZee

Reputation: 867

As far as I can tell, the only way that you can control the line separator that the default Java implementation of Transformer interface uses in Java 11 is to set the line.separator property on the Java command line. For the simple example program here, you could do that by creating a text file named javaArgs reading

-Dline.separator="\n"

and executing the program with the command line

java @javaArgs TestXmlTransformerLineSeparator

The @ syntax that was introduced in Java 9 is useful here because the @-file is parsed in a way that will convert the "\n" into the LF line separator. It's possible to accomplish the same thing without an @-file, but the only ways I know of require more complicated OS-dependent syntax to define a variable that contains the line separator you want and having the java command line expand the variable.

If the line separator that you want is CRLF, then the javaArgs file would instead read

-Dline.separator="\r\n"

Within a larger program changing the line.separator variable for the entire application may well be unacceptable. To avoid setting the line.separator for an entire application, it would be possible to launch a separate Java process with the command line just discussed, but the overhead of launching the process and communicating with the separate process to transfer the data that the Transformer is supposed to write to a stream would probably make that an undesirable solution.

So realistically, a better solution would probably be to implement a FilterWriter that filters the output stream to convert the line separator to the line separator that you want. This solution does not change the line separator used within the transformer itself and might be considered post-processing the result of the transformer, so in a sense it is not an answer to your specific question, but I think it does give the desired result without a lot of overhead. Here is an example that uses a FilterWriter to remove all CR characters (that is, carriage returns) from the output writer.

import java.io.FilterWriter;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;
import java.util.stream.Collectors;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;


public class TransformWithFilter {

    private static class RemoveCRFilterWriter extends FilterWriter {

        RemoveCRFilterWriter(Writer wrappedWriter) {
            super(wrappedWriter);
        }

        @Override
        public void write(int c) throws IOException {
            if (c != (int)('\r')) {
                super.write(c);
            }
        }

        @Override
        public void write(char[] cbuf, int offset, int length) throws IOException {
            int localOffset = offset;
            for (int i = localOffset; i < offset + length; ++i) {
                if (cbuf[i] == '\r') {
                    if (i > localOffset) {
                        super.write(cbuf, localOffset, i - localOffset);
                    }
                    localOffset = i + 1;
                }
            }
            if (localOffset < offset + length) {
                super.write(cbuf, localOffset, offset + length - localOffset);
            }
        }

        @Override
        public void write(String str, int offset, int length) throws IOException {
            int localOffset = offset;
            for (int i = localOffset; i < offset + length; ++i) {
                if (str.charAt(i) == '\r') {
                    if (i > localOffset) {
                        super.write(str, localOffset, i - localOffset);
                    }
                    localOffset = i + 1;
                }
            }
            if (localOffset < offset + length) {
                super.write(str, localOffset, offset + length - localOffset);
            }
        }
    }

    public static void main(String[] args) throws Exception {
        String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><foo/></root>";
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");

        StreamSource source = new StreamSource(new StringReader(xml));
        StringWriter stringWriter = new StringWriter();
        FilterWriter writer = new RemoveCRFilterWriter(stringWriter);
        StreamResult target = new StreamResult(writer);

        transformer.transform(source, target);

        System.out.println(stringWriter.toString().chars().mapToObj(c -> c <= ' ' ? "#" + c : "" + Character.valueOf((char) c))
                .collect(Collectors.joining(" ")));
        System.out.println(stringWriter);
    }
}

Another practical solution to the problem of serializing XML is to obtain a DOM representation of the XML either by using the Transformer to get a DOMResult or by directly parsing into a DOM and writing out the DOM with an LSSerializer, which provides explicit support for setting the line separator. Since that moves away from using the Transformer and there are other examples of it on Stack Overflow, I will not discuss it further here.

What might be useful, though, is reviewing what changed in Java 11 and why I think there isn't another way to control the line separator used by Java's default implementation of the Transformer. Java's default implementation of the Transformer interface uses the ToXMLStream class that inherits from com.sun.org.apache.xml.internal.serializer.ToStream and is implemented in the same package. Reviewing the commit history of OpenJDK, I found that src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToStream.java was changed here from reading the line.separator property as currently defined in the system properties to instead reading System.lineSeparator(), which corresponds to the line separator at initialization of the Java virtual machine. This commit was first released in Java 11, so the code in the question should behave the same as it did in Java 8 up to and including Java 10.

If you spend some time reading ToStream.java as it existed after the commit that changed how the line separator is read (accessible here), especially focusing on lines 135 to 140 and 508 to 514, you will notice that the serializer implementation does support using other line separators, and in fact, the output property identified as

{http://xml.apache.org/xalan}line-separator

is supposed to be a way to control which line separator is used.

Why doesn't the example in the question work, then? Answer: In the current Java default implementation of the Transformer interface, only a specific few of the properties that the user sets are transferred to the serializer. These are primarily the properties that are defined in the XSLT specification, but the special indent-amount property is also transferred. The line separator output property, though, is not one of the properties that is transferred to the serializer.

Output properties that are explicitly set on the Transformer itself using setOutputProperty are transferred to the serializer by the setOutputProperties method defined on lines 1029-1128 of com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl (accessible here). If you instead define an explicit XSLT transform and use its <xsl:output> tag to set the output properties, the properties that are transferred to the serializer are filtered first of all by the parseContents method defined on lines 139-312 of com.sun.org.apache.xalan.internal.xsltc.compiler.Output (accessible here) and filtered again in the transferOutputSettings method defined on lines 671-715 of com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet (accessible here).

So to summarize, it appears that there is no output property that you can set on the default Java implementation of the Transformer interface to control the line separators that it uses. There may well be other providers of Transformer implementations that do provide control of the line separator, but I have no experience with any implementation of the Transformer interface in Java 11 other than the default implementation that is provided with the OpenJDK release.

Upvotes: 5

Related Questions