overexchange
overexchange

Reputation: 1

Handling non-english characters using Eclipse

Below is the text that I would like to paste in bundle.properties file using Eclipse.

Честит рожден ден

Instead Eclipse displays these characters in unicode escape notation, as shown below: \u0427\u0435\u0441\u0442\u0438\u0442 \u0440\u043E\u0436\u0434\u0435\u043D \u0434\u0435\u043D

How do I resolve this problem?

Upvotes: 3

Views: 592

Answers (1)

Holger
Holger

Reputation: 298103

This is intended behavior. The PropertyResourceBundle relies on the Properties class, whose load method always assumes the file to be encoded is iso-latin-1¹:

The input stream is in a simple line-oriented format as specified in load(Reader) and is assumed to use the ISO 8859-1 character encoding; that is each byte is one Latin1 character. Characters not in Latin1, and certain special characters, are represented in keys and elements using Unicode escapes as defined in section 3.3 of The Java™ Language Specification.

So converting your copied characters to Unicode escape sequences in the right thing to ensure that they will be loaded properly. At runtime, the ResourceBundle will contain the right character content.

While in Eclipse, source files usually inherit the charset setting from their parent, to end up at the project or even system wide setting, it supports setting the charset encoding for single files and conveniently changes it automatically to iso-latin-1 for .properties files.

Note that starting with Java 9, you can use UTF-8 for properties resource bundles. This does not require additional configuration actions, as the charset encoding is determined by probing. As the documentation of the PropertyResourceBundle(InputStream) constructor states:

This constructor reads the property file in UTF-8 by default. If a MalformedInputException or an UnmappableCharacterException occurs on reading the input stream, then the PropertyResourceBundle instance resets to the state before the exception, re-reads the input stream in ISO-8859-1 and continues reading. If the system property java.util.PropertyResourceBundle.encoding is set to either "ISO-8859-1" or "UTF-8", the input stream is solely read in that encoding, and throws the exception if it encounters an invalid sequence.

This works, as both encodings are identical for ASCII characters, while for non-ASCII sequences, it practically never happens for real life text that an iso-latin-1 sequence forms a valid UTF-8 sequence. This applies to PropertyResourceBundle which handles this probing, not for the Properties class, which still only uses iso-latin-1 in its load(InputStream) method.

¹ I kept the statement in this absolute form for simplicity, despite, as elaborated at the end of this answer, Java 9 has lifted this restriction.

Upvotes: 1

Related Questions