tiamat
tiamat

Reputation: 971

String encoding charset issue

I receive a String in ISO-8859-1 encoding but some characters are not decoded correctly...

here is the code I'm using:

InputStream plainIs = plainText.getIs();
StringBuilder stringBuilder = new StringBuilder();
String line = null;                 
try (BufferedReader bufferedReader = new BufferedReader(new 
    InputStreamReader(plainIs, "iso-8859-1"))) {    
    while ((line = bufferedReader.readLine()) != null) {
            stringBuilder.append(line);
    }
}                                       
body = stringBuilder.toString();
log.debug("Plain Text Body: "+body);

as an input, I have a sentence like this:

L=92objet est donc de proposer un outil simple =E9volutif

but the translation is

L�objet est donc de proposer un outil simple évolutif

the character =E9 is correctly translated in é but the character L=92 is translated like this: L�objet

any idea why I have only a partial conversion ?

Upvotes: 1

Views: 170

Answers (1)

Nexevis
Nexevis

Reputation: 4667

It seems 92 is not defined in ISO-8859-1 (nothing in the 90s are) as you can see on this page in the chart. It shows é as E9 which is why it is outputting correctly. If you are attempting to get ' as a character, try using =27 instead of =92.

There is also the superset of ISO-8859-1 with Windows-1252 found here, which does have 92 defined in the second version:

The second version, used in Microsoft Windows 2.0, positions D7, F7, 91, and 92 had been defined.

Upvotes: 1

Related Questions