Reputation: 155
I am testing my application's i18n compatibility. I have a English version of Windows 7 which mean the system's display language is English. And I set the system locale as Chinese for non-unicode application.
My application encountered problems when exporting Html files with Chinese character under jdk1.6, but works fine when running under jdk1.7.
I debugged it and found the direct reason was that Charset.defaultCharset()
returned different values.
Under jdk1.7 Charset.defaultCharset()
returned GBK
which is the charset for chinese.
Under jdk1.6 Charset.defaultCharset()
returned window_1252
which is charset for Latin language.
I know the problem can be solved by designate charset,say utf-8
, in code.
But I want to know why Charset.defaultCharset()
return different values under JDK1.7 and JDK 1.6 .
Upvotes: 8
Views: 5155
Reputation: 12447
The Java 7 technote says:
The supported encodings vary between different implementations of the Java Platform, Standard Edition 7 (Java SE 7).
The Charset doc says:
Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets. The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.
Also, I've found a "bug" about using -Dfile.encoding
with this final evaluation:
This is not a bug. The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.
The preferred way to change the default encoding used by the VM and the runtime system is to change the locale of the underlying platform before starting your Java program.
Upvotes: 3
Reputation: 2120
Charset.defaultCharset()
gives the charset of JVM running, so it is not always the same value. For example if you are running your programs with Netbeans, it will always return UTF-8, since that's the default encoding for Java Projects in Netbeans.
I have a setup similar to yours. My Windows is English (menus, dialogs are English) and I'm using Turkish for non-Unicode applications. When I start JVM without any flag or system parameter, both Java 7 and Java 6 runtimes give "CP1254" when Charset.defaultCharset()
is called. System.getProperty("file.encoding")
and default IO encoding are also the same. ( The locale of the system is different in these two Java versions, however that's another story. )
So I guess your problem is either about how you start your JVM, or about how JVM decides to default encoding it should use. If you are sure that the problem is not the former one (you run JVM without any encoding parameter and you do not attempt to change the default charset anywhere in your program), then JVM fetches the default encoding incorrectly and most probably that's abnormal behaviour.
Upvotes: 3