How to translate unicode characters to ISO-8859-15/Latin9 variant?

Question

I have a UTF-8 JSON which contains escaped Unicode characters. For example:

{
    "description": "This is an ellipsis: \u2026"
}

The JSON is parsed with Jackson. At a later stage, the strings are converted into bytes for a ISO-8859-15/Latin9 platform:

final byte[] d = description.getBytes(Charset.forName("ISO-8859-15"));

Obviously, the ellipsis character (…) is not in the ISO-8859-15/Latin9 character set (see https://www.charset.org/charsets/iso-8859-15).

I am looking for a way to convert non-supported Unicode characters to a sensible ISO-8859-15/Latin9-supported character or set of characters. Here, I would expect three dots.

Examples of other characters which are present in the input and an expected counterpart:

\u2013 -> – -> -
\u2018 -> ‘ -> '
\u2019 -> ’ -> '
\u201c -> “ -> "
\u201d -> ” -> "
\u2022 -> • -> .

Ideally, this is done without having to enumerate all possible inputs and outcomes. That is, not by myself, as I don't want to maintain a rather extensive mapping table.

Is there a JDK class or external library out there which can do the conversion?

How to translate unicode characters to ISO-8859-15/Latin9 variant?

Answers (1)

Related Questions