Reputation: 4591
I'm trying on Windows 7 to capture the console output of one jar (written with System.out
) and write it out as an XML file. This works, but I'm having encoding problems (e.g. with an "ë").
I have this code for reading the console output:
final LinkedList<String> texOutput = new LinkedList<String>();
final Process p = Runtime.getRuntime().exec("java -jar " + absoluteNameOfJar, null, tmpDir);
String line;
final BufferedReader output = new BufferedReader(new InputStreamReader(p.getInputStream(), "Cp1252"));
while ( (line = output.readLine()) != null) {
texOutput.add(line);
}
And here's the code for writing the LinkedList
to XML (using jdom
)
if (texOutput.size() > 0) {
final Element xmlTeXOutput = new Element(XML_ELEMENT_KEY_TEX_OUTPUT);
for (String line : texOutput) {
xmlLine = new Element(XML_ELEMENT_KEY_LINE);
xmlLine.setText(line);
xmlTeXOutput.addContent(xmlLine);
}
genOut.addContent(xmlTeXOutput);
}
With this I get encoding errors in the XML (from the wrongly converted "ë"): "Invalid byte 2 of 3-byte UTF-8 sequence".
I found these questions: How to get console charset?, Java : How to determine the correct charset encoding of a stream - none give me any hope - it seems I have to set the correct encoding for the InputStreamReader
, but there seems to be no portable method to find the encoding actually used. Is there a way to fix this?
Oh, and if possible a portable solution should work on MacOS too. And I don't want to set the encoding of the XML to ISO-8859-1 (which seems to be the common work-around according to Google): UTF-8 should work.
EDIT: I write the XML file thusly:
final XMLOutputter xmlOutputter = new XMLOutputter(Format.getPrettyFormat());
final String targetXMLFileName = FilenameUtils.concat(targetDirName, xmlID.getText() + "-out.xml");
final File targetXMLFile = new File(targetXMLFileName);
final FileWriter targetXMLFileWriter = new FileWriter(targetXMLFile);
xmlOutputter.output(xmlOutput, targetXMLFileWriter);
targetXMLFileWriter.close();
Upvotes: 1
Views: 950
Reputation: 108959
There are a number of potential problems here:
Verify that data is being read correctly from the other process. If the default encoding is causing an issue, you may want to write a wrapper app with a main
method that sets stdout to a Unicode-encoding stream and then invoke the other main
. Then decode within the above code using the same encoding.
There is also a hack involving file.encoding
but this may cause unintended side-effects.
If the problem is with serializing the XML it is likely that the data is being written with the wrong encoding even though the declaration is UTF-8. This commonly happens when serializing to a Writer
as the serializer does not control the output encoding as it would with an OutputStream
.
EDIT
The problem is here:
new FileWriter(targetXMLFile);
From the documentation:
Convenience class for writing character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable.
If you always want UTF-8, construct a stream that writes UTF-8.
Upvotes: 1