Reputation: 11835
Most of the encodings in use encode ASCII characters identically. This means that, if I know that a String can be encoded by ASCII encoding, then I don't need to bother and can use any such encoding. Further, if the system default encoding has this 'ASCII compatibility' property, I can safely use new String(bytes)
for bytes representing ASCII strings.
But there are encodings (for example, EBCDIC) that are incompatible with ASCII.
Can such encoding be set as default in Java?
Upvotes: -1
Views: 196
Reputation: 11835
An experiment.
public class Test {
public static void main(String[] args) {
System.out.println(java.nio.charset.Charset.defaultCharset());
byte[] bytes = "test".getBytes();
for (int i = 0; i < bytes.length; i++) {
System.out.println(bytes[i]);
}
}
}
java -Dfile.encoding=utf-8 Test
produces the following:
UTF-8
116
101
115
116
java -Dfile.encoding=cp1140 Test
(where cp1140 is a non-ASCII compatible encoding, an EBCDIC variant) outputs something that my terminal cannot decypher, but, if I use the same cp1140
to decode it, it turns out to be
IBM01140
-93
-123
-94
-93
So yes, a non-ASCII encoding may be set as a default, and a JVM starts happily with it.
One takeaway is that it's not safe to use String#getBytes()
even when you are sure you are always dealing with ASCII strings.
Another takeaway from my experiments is that file.encoding
may be assigned any garbage, and in such case a JVM will not complain, but it will silently ignore the property:
$ java -Dfile.encoding=aaa Test
UTF-8
116
101
115
116
This is how it works with Java 1.8.0_161.
Java 11.0.6 does not seem to support cp1140 (it still uses UTF-8 even with -Dfile.encoding=cp1140
).
Upvotes: 1