Eran Medan
Eran Medan

Reputation: 45775

How to Preserve the Input's Declared Encoding in the Output of javax.xml.transform.Transformer.transform? (e.g. avoid UTF-16 changing to UTF-8)

Assuming this input XML

<?xml version="1.0" encoding="UTF-16"?>
<test></test>

Writing these lines of code :

StreamSource source = new StreamSource(new StringReader(/* the above XML*/));
StringWriter stringWriter = new StringWriter();
StreamResult streamResult = new StreamResult(stringWriter);
TransformerFactory.newInstance().newTransformer().transform(source, streamResult);
return stringWriter.getBuffer().toString();

Outputs for me this XML:

<?xml version="1.0" encoding="UTF-8"?>
<test></test>

(the declared encoding of UTF-16 is converted to the default UTF-8)

I know I can explicitly ask for UTF-16 output

transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

But the question is, how to make the output encoding automatically be the same as the input?

Upvotes: 6

Views: 9651

Answers (4)

tecfield
tecfield

Reputation: 203

try this:

// Create an XML Stream Reader
XMLStreamReader xmlSR = XMLInputFactory.newInstance()
        .createXMLStreamReader(new StringReader(/* the above XML*/));
// Wrap the XML Stream Reader in a StAXSource
StAXSource source = new StAXSource(xmlSR);
// Create a String Writer
StringWriter stringWriter = new StringWriter();
// Create a Stream Result
StreamResult streamResult = new StreamResult(stringWriter);
// Create a transformer
Transformer transformer = TransformerFactory.newInstance().newTransformer();
// Set STANDALONE based on the source stream
transformer.setOutputProperty(OutputKeys.STANDALONE,
        xmlSR.isStandalone() ? "yes" : "no");
// Set ENCODING based on the source stream
transformer.setOutputProperty(OutputKeys.ENCODING,
        xmlSR.getCharacterEncodingScheme());
// Set VERSION based on the source stream
transformer.setOutputProperty(OutputKeys.VERSION, xmlSR.getVersion());
// Transform the source stream to the out stream
transformer.transform(source, streamResult);
// Print the results
return stringWriter.getBuffer().toString();

Upvotes: 3

Michael Kay
Michael Kay

Reputation: 163645

The XSLT processor doesn't actually know what the input encoding is (the XML parser doesn't tell it, because it doesn't need to know). You can set the output encoding using xsl:output, but to make this the same as the input encoding you're going to have to discover the input encoding first, for example by peeking at the source file before parsing it.

Upvotes: 1

Jochen Bedersdorfer
Jochen Bedersdorfer

Reputation: 4122

You need to peek into the stream first. Section F of the XML specification gives you an idea how to auto-detect the encoding.

Upvotes: 1

Michael Borgwardt
Michael Borgwardt

Reputation: 346536

To do this, you'll have to use something more sophisticated than a StreamSource. For example, a StAXSource takes an XMLStreamReader, which has the getCharacterEncodingScheme() method that tells you which encoding the input document used - you can the set that as output enocding.

Upvotes: 5

Related Questions