Reputation: 63
I wrote an XML parser, everything is working fine except the text encoding. I made some researches to fix this, but i'm still stuck.
I've got a list of string which contains movies titles and I add it to the XML with a CDATA encapsulation, for example :
CDATA movieTitle= new CDATA(aMovie.getTitle());
movie.addContent(new Element("title").addContent(movieTitle));
And I save it using this :
XMLOutputter xmlOutput = new XMLOutputter();
Format format = Format.getPrettyFormat();
format.setEncoding("UTF-8");
xmlOutput.setFormat(format);
xmlOutput.output(doc, new FileWriter(fileName+ ".xml"));
But the result is :
<title><![CDATA[LA LOI DU MARCHxC9]></title>
And should be "LA LOI DU MARCHÉ".
What should I do to avoid this happening ?
Upvotes: 1
Views: 1067
Reputation: 17707
This is a common problem with JDOM, and it's an issue that derives from how Java handles OutputStreams and Writers. In essence, Java does not make the file encoding visible in a Writer.... In your case, you're probably running an ASCII-based writer.... and it can't encode the unicode É correctly.
See the notes on the XMLOutputter's documentation
The solution is to use a FileoutputStream instead of a FileWriter. Since UTF-8 is the default encoding, you don't need to set it. Try it:
XMLOutputter xmlOutput = new XMLOutputter();
xmlOutput.setFormat(Format.getPrettyFormat());
try (OutputStream out = new FileOutputStream(fileName+ ".xml")) {
xmlOutput.output(doc, out);
}
Upvotes: 1
Reputation: 109547
As the XML already knows about the encoding, and places it in the <?xml encoding ?>
, I prefer the solution of @rolfl, a binary OutputStream.
The error here is, that FileWriter is a very old utility class that uses the default encoding. Which is absolutely non-portable.
xmlOutput.output(doc, Files.newBufferedWriter(Paths.get(fileName+ ".xml"),
StandardCharsets.UTF_8));
Upvotes: 2