tofcoder
tofcoder

Reputation: 2382

Java, UTF-8, and Windows console

We try to use Java and UTF-8 on Windows. The application writes logs on the console, and we would like to use UTF-8 for the logs as our application has internationalized logs.

It is possible to configure the JVM so it generates UTF-8, using -Dfile.encoding=UTF-8 as arguments to the JVM. It works fine, but the output on a Windows console is garbled.

Then, we can set the code page of the console to 65001 (chcp 65001), but in this case, the .bat files do not work. This means that when we try to launch our application through our script (named start.bat), absolutely nothing happens. The command simple returns:

C:\Application> chcp 65001
Activated code page: 65001
C:\Application> start.bat

C:\Application>

But without chcp 65001, there is no problem, and the application can be launched.

Any hints about that?

Upvotes: 15

Views: 24264

Answers (5)

erickson
erickson

Reputation: 269797

Try chcp 65001 && start.bat

The chcp command changes the code page, and 65001 is the Win32 code page identifier for UTF-8 under Windows 7 and up. A code page, or character encoding, specifies how to convert a Unicode code point to a sequence of bytes or back again.

Upvotes: 12

YIN SHAN
YIN SHAN

Reputation: 89

Java on windows does NOT support unicode ouput by default. I have written a workaround method by calling Native API with JNA library.The method will call WriteConsoleW for unicode output on the console.

import com.sun.jna.Native;
import com.sun.jna.Pointer;
import com.sun.jna.ptr.IntByReference;
import com.sun.jna.win32.StdCallLibrary;

/** For unicode output on windows platform
 * @author Sandy_Yin
 * 
 */
public class Console {
    private static Kernel32 INSTANCE = null;

    public interface Kernel32 extends StdCallLibrary {
        public Pointer GetStdHandle(int nStdHandle);

        public boolean WriteConsoleW(Pointer hConsoleOutput, char[] lpBuffer,
                int nNumberOfCharsToWrite,
                IntByReference lpNumberOfCharsWritten, Pointer lpReserved);
    }

    static {
        String os = System.getProperty("os.name").toLowerCase();
        if (os.startsWith("win")) {
            INSTANCE = (Kernel32) Native
                    .loadLibrary("kernel32", Kernel32.class);
        }
    }

    public static void println(String message) {
        boolean successful = false;
        if (INSTANCE != null) {
            Pointer handle = INSTANCE.GetStdHandle(-11);
            char[] buffer = message.toCharArray();
            IntByReference lpNumberOfCharsWritten = new IntByReference();
            successful = INSTANCE.WriteConsoleW(handle, buffer, buffer.length,
                    lpNumberOfCharsWritten, null);
            if(successful){
                System.out.println();
            }
        }
        if (!successful) {
            System.out.println(message);
        }
    }
}

Upvotes: 8

Roger F. Gay
Roger F. Gay

Reputation: 1971

Windows doesn't support the 65001 code page: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/chcp.mspx?mfr=true

Upvotes: -1

Renato Soffiatto
Renato Soffiatto

Reputation: 165

We had some similar problems in Linux. Our code was in ISO-8859-1 (mostly cp-1252 compatible) but the console was UTF-8, making the code to not compile. Simply changing the console to ISO-8859-1 would make the build script, in UTF-8, to break. We found a couple of choices:
1- define some standard encoding and sticky to it. That was our choice. We choose to keep all in ISO-8859-1, modifying the build scripts.
2- Setting the encoding before starting any task, even inside the build scripts. Some code like the erickson said. In Linux was like :

lang=pt_BR.ISO-8859-1 /usr/local/xxxx

My eclipse is still like this. Both do work well.

Upvotes: 0

sblundy
sblundy

Reputation: 61424

Have you tried PowerShell rather than old cmd.exe.

Upvotes: -5

Related Questions