user547453
user547453

Reputation: 1045

saving Java file in UTF-8

When I run this program it gives me a '?' for the unicode code-point \u0508. This is because the default windows character encoding CP-1252 is unable to map this code-point.

But when I save this file in Eclipse as 'Text file encoding' = UTF-8 and run this program it gives me the correct output AԈC.

why does this work? I mean the java file is saved as UTF-8 but still the underlying windows OS encoding is CP-1252. My question is similar to, when I try to read a text file in UTF-16 which was originally written in UTF-8, the output is wierd with different box symbols.

public class e {
public static void main(String[] args) {
    System.out.println(System.getProperty("file.encoding"));
    String original = new String("A" + "\u0508" + "C");
    try {
        System.out.println("original = " + original);
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

Upvotes: 3

Views: 2739

Answers (2)

bmargulies
bmargulies

Reputation: 100050

The issue is the setting of file.encoding when you run the program, and the destination of System.out. If System.out is an eclipse console, it may well be set to be UTF-8 eclipse console. If it's just a Windows DOS box, it is a CP1252 code page, and will only display ? in this case.

Upvotes: 2

Martijn Courteaux
Martijn Courteaux

Reputation: 68847

Saving the Java source file either as UTF-8 or Windows-1252 shouldn't make any difference, because both encodings encode all the ASCII code-points the same way. And your source file is only using ASCII characters.

So, that you should try to find the bug somewhere else. I suggest to redo the steps you did with care and do the tests over.

Upvotes: 3

Related Questions