Ivan Popovych
Ivan Popovych

Reputation: 1

Saving Chinese characters using Java HtmlEditorKit

I'm trying to save HtmlDocument(saved with UTF-8 encoding) which contains Chinese character 𠜎 using HtmlEditorKit in the following way:

try (OutputStreamWriter f = new OutputStreamWriter(fileOutputStream, "UTF-8")) {
    htmlEditorKit.write(f, htmlDocument, 0, htmlDocument.getLength());
} catch (BadLocationException e) {
    logger.error("Could not save", e);
}

In output HTML doc I'm getting two 2 bytes characters(amp#55361;amp#57102;) instead of one 4 bytes character. Java can understand which symbol is it by combining both of them, but HTML can't.
Any suggestion on how to save it, so HTML page could be correctly displayed?

Here is output html:

<html>
<head>
<meta content="text/html" charset="utf-8">
</head>
<body>
<p>&#55361;&#57102;</p>
</body>
</html>

Upvotes: 0

Views: 84

Answers (0)

Related Questions