Reputation: 17107
In my *_fr.properties file in my java web app, I see french characters being encoded like this (as an example):
t\u00e9l\u00e9graphique entrant
What kind of encoding is this? (utf -8)?
Also if I have a french accented word, how do i find what to put in my properties file?
The link I am looking at does not have this kind of encoding.
http://tlt.its.psu.edu/suggestions/international/bylanguage/french.html
Upvotes: 2
Views: 1271
Reputation: 47233
It's UTF-16, with the 16-bit numbers being written as four-digit hexadecimal escapes. For all the characters you are likely to use, the numbers are just the Unicode codepoints.
If you ever have to deal with a character from one of the 'astral planes', where codepoints are too big to fit in 16 bits, well, things are slightly more complicated, and we can talk about it then.
This is the encoding that Java itself uses in the JVM - all text is represented as a sequence of 16-bit numbers - and the format used in Java source code. That's why it's used in properties files.
To write a French (or other) character, you need to find out what its codepoint is, and write that as a hexadecimal number. I could refer you to the Unicode standard, but to be honest, the easiest thing is just to look the character up on Wikipedia - their list has the codepoints already written in hex. Taking your example, looking up '00e9' reveals that that is 'é'.
Upvotes: 1