Reputation: 597
A comment at How to control line endings that javax.xml.transform.Transformer creates? suggests setting system property "line.separator". This worked for me (and is acceptable for my task at hand) in Java 8 (Oracle JDK 1.8.0_171), but not in Java 11 (openjdk 11.0.1).
From ticket XALANJ-2137 I made an (uneducated, as I don't even know which javax.xml implementation I am using) guess to try setOutputProperty("{http://xml.apache.org/xslt}line-separator", ..)
or maybe setOutputProperty("{http://xml.apache.org/xalan}line-separator", ..)
, but neither works.
How can I control the transformer's line breaks in Java 11?
Here's some demo code which prints "... #13 #10 ..." under Windows with Java 11, where it should print "... #10 ..." only.
package test.xml;
import java.io.StringReader;
import java.io.StringWriter;
import java.util.stream.Collectors;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class TestXmlTransformerLineSeparator {
public static void main(String[] args) throws Exception {
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><foo/></root>";
final String lineSep = "\n";
String oldLineSep = System.setProperty("line.separator", lineSep);
try {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.setOutputProperty("{http://xml.apache.org/xalan}line-separator", lineSep);
transformer.setOutputProperty("{http://xml.apache.org/xslt}line-separator", lineSep);
StreamSource source = new StreamSource(new StringReader(xml));
StringWriter writer = new StringWriter();
StreamResult target = new StreamResult(writer);
transformer.transform(source, target);
System.out.println(writer.toString().chars().mapToObj(c -> c <= ' ' ? "#" + c : "" + Character.valueOf((char) c))
.collect(Collectors.joining(" ")));
System.out.println(writer);
} finally {
System.setProperty("line.separator", oldLineSep);
}
}
}
Upvotes: 3
Views: 3116
Reputation: 867
As far as I can tell, the only way that you can control the line separator that the default Java implementation of Transformer
interface uses in Java 11 is to set the line.separator property on the Java command line. For the simple example program here, you could do that by creating a text file named javaArgs reading
-Dline.separator="\n"
and executing the program with the command line
java @javaArgs TestXmlTransformerLineSeparator
The @ syntax that was introduced in Java 9 is useful here because the @-file is parsed in a way that will convert the "\n" into the LF line separator. It's possible to accomplish the same thing without an @-file, but the only ways I know of require more complicated OS-dependent syntax to define a variable that contains the line separator you want and having the java command line expand the variable.
If the line separator that you want is CRLF, then the javaArgs file would instead read
-Dline.separator="\r\n"
Within a larger program changing the line.separator variable for the entire application may well be unacceptable. To avoid setting the line.separator for an entire application, it would be possible to launch a separate Java process with the command line just discussed, but the overhead of launching the process and communicating with the separate process to transfer the data that the Transformer
is supposed to write to a stream would probably make that an undesirable solution.
So realistically, a better solution would probably be to implement a FilterWriter
that filters the output stream to convert the line separator to the line separator that you want. This solution does not change the line separator used within the transformer itself and might be considered post-processing the result of the transformer, so in a sense it is not an answer to your specific question, but I think it does give the desired result without a lot of overhead. Here is an example that uses a FilterWriter
to remove all CR characters (that is, carriage returns) from the output writer.
import java.io.FilterWriter;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;
import java.util.stream.Collectors;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class TransformWithFilter {
private static class RemoveCRFilterWriter extends FilterWriter {
RemoveCRFilterWriter(Writer wrappedWriter) {
super(wrappedWriter);
}
@Override
public void write(int c) throws IOException {
if (c != (int)('\r')) {
super.write(c);
}
}
@Override
public void write(char[] cbuf, int offset, int length) throws IOException {
int localOffset = offset;
for (int i = localOffset; i < offset + length; ++i) {
if (cbuf[i] == '\r') {
if (i > localOffset) {
super.write(cbuf, localOffset, i - localOffset);
}
localOffset = i + 1;
}
}
if (localOffset < offset + length) {
super.write(cbuf, localOffset, offset + length - localOffset);
}
}
@Override
public void write(String str, int offset, int length) throws IOException {
int localOffset = offset;
for (int i = localOffset; i < offset + length; ++i) {
if (str.charAt(i) == '\r') {
if (i > localOffset) {
super.write(str, localOffset, i - localOffset);
}
localOffset = i + 1;
}
}
if (localOffset < offset + length) {
super.write(str, localOffset, offset + length - localOffset);
}
}
}
public static void main(String[] args) throws Exception {
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><root><foo/></root>";
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
StreamSource source = new StreamSource(new StringReader(xml));
StringWriter stringWriter = new StringWriter();
FilterWriter writer = new RemoveCRFilterWriter(stringWriter);
StreamResult target = new StreamResult(writer);
transformer.transform(source, target);
System.out.println(stringWriter.toString().chars().mapToObj(c -> c <= ' ' ? "#" + c : "" + Character.valueOf((char) c))
.collect(Collectors.joining(" ")));
System.out.println(stringWriter);
}
}
Another practical solution to the problem of serializing XML is to obtain a DOM representation of the XML either by using the Transformer
to get a DOMResult
or by directly parsing into a DOM and writing out the DOM with an LSSerializer
, which provides explicit support for setting the line separator. Since that moves away from using the Transformer
and there are other examples of it on Stack Overflow, I will not discuss it further here.
What might be useful, though, is reviewing what changed in Java 11 and why I think there isn't another way to control the line separator used by Java's default implementation of the Transformer
. Java's default implementation of the Transformer
interface uses the ToXMLStream
class that inherits from com.sun.org.apache.xml.internal.serializer.ToStream
and is implemented in the same package. Reviewing the commit history of OpenJDK, I found that src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToStream.java
was changed here from reading the line.separator
property as currently defined in the system properties to instead reading System.lineSeparator()
, which corresponds to the line separator at initialization of the Java virtual machine. This commit was first released in Java 11, so the code in the question should behave the same as it did in Java 8 up to and including Java 10.
If you spend some time reading ToStream.java
as it existed after the commit that changed how the line separator is read (accessible here), especially focusing on lines 135 to 140 and 508 to 514, you will notice that the serializer implementation does support using other line separators, and in fact, the output property identified as
{http://xml.apache.org/xalan}line-separator
is supposed to be a way to control which line separator is used.
Why doesn't the example in the question work, then? Answer: In the current Java default implementation of the Transformer
interface, only a specific few of the properties that the user sets are transferred to the serializer. These are primarily the properties that are defined in the XSLT specification, but the special indent-amount
property is also transferred. The line separator output property, though, is not one of the properties that is transferred to the serializer.
Output properties that are explicitly set on the Transformer itself using setOutputProperty
are transferred to the serializer by the setOutputProperties
method defined on lines 1029-1128 of com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl
(accessible here). If you instead define an explicit XSLT transform and use its <xsl:output>
tag to set the output properties, the properties that are transferred to the serializer are filtered first of all by the parseContents
method defined on lines 139-312 of com.sun.org.apache.xalan.internal.xsltc.compiler.Output
(accessible here) and filtered again in the transferOutputSettings
method defined on lines 671-715 of com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet
(accessible here).
So to summarize, it appears that there is no output property that you can set on the default Java implementation of the Transformer
interface to control the line separators that it uses. There may well be other providers of Transformer
implementations that do provide control of the line separator, but I have no experience with any implementation of the Transformer
interface in Java 11 other than the default implementation that is provided with the OpenJDK release.
Upvotes: 5