colonel R
colonel R

Reputation: 33

Can't properly print non-English characters like ë in Windows console

For some weird reason I can't seem to print ë in Java.

public class Eindopdracht0002test
{
  public static void main(String[] args)
  {
    System.out.println("\u00EB");
  }
}  

It's supposed to print "België" (dutch for Belgium) however it returns "Belgi├½".

Does anyone know how to resolve this?

Upvotes: 1

Views: 491

Answers (1)

Pshemo
Pshemo

Reputation: 124235

In UTF-8 ë is written as 11000011 10101011 (source: https://unicode-table.com/en/00EB).
Console in Windows is using code pages which are 8-bit mappings to characters (you can check code page of your console with chcp command). This means when ë is sent to output stream (console) as 11000011 10101011 bits, console sees it as two characters, which in 850 code page (based on your comments) are mapped to:

  • - 11000011 (195 in decimal)
  • ½ - 10101011 (171 in decimal)

If you don't want to use UTF-8 encoding you can create separate Writer and specify different encoding which will translate characters to bytes according to that encoding. To do so you can use

OutputStreamWriter(OutputStream out, String charsetName)

which in your case may look like

OutputStreamWriter(System.out, "cp850") osw = OutputStreamWriter(System.out, "cp850");
//  needed encoding ------------^^^^^

since you want send characters with specified encoding to standard output stream.

To use println method and ensure it will automatically flush its data you can wrap created OutputStreamWriter in

PrintWriter(OutputStream out, boolean autoFlush)

like

PrintWriter out = new PrintWriter(osw, true);

You can also do both these things in one line:

PrintWriter out = new PrintWriter(new OutputStreamWriter(System.out, "cp850"), true);

Now if you use out.println("\u00EB"); it should use recognize ë character and use cp850 encoding to locate its mapping (which is 137) and send proper byte representation (here 10001001) to System.out (console).

Upvotes: 2

Related Questions