Reputation: 659
By default, Character
and String
use UTF-16, however, for all practical purposes, in North America and most of the english locales, UTF-8 is sufficient (since it can go upto 4 bytes). So, if I use a InputStreamReader(InputStream)
, then does it give me default UTF-16 char
encoding? Using a InputStreamReader(InputStream, "UTF-8")
would provide a UTF-8 encoding, which would suffice my purpose.
How can I auto-set my JVM's default encoding to UTF-8 while using English locale? The intention is to improve performance for Character
and String
manipulation (by using 8-bit scheme instead of 16-bit encoding and most ASCII is covered using 8-bit encoding and at the same time complying with Unicode standard).
Any comments are appreciated. Thanks!
Upvotes: 5
Views: 1451
Reputation: 15418
So, if I use a InputStreamReader(InputStream), then does it give me default UTF-16 char encoding? Using a InputStreamReader(InputStream, "UTF-8") would provide a UTF-8 encoding, which would suffice my purpose.
How can I auto-set my JVM's default encoding to UTF-8 while using English locale?
From InputstreamReader
java DOC:
The charset that InputStreamReader uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.
like when i try to print in my platform using reader.getEncoding()
; it prints UTF-8
. Java gets character encoding by calling System.getProperty("file.encoding")
at the time of JVM start-up. So if Java doesn't get any file.encoding attribute it uses "UTF-8"
character encoding for all practical purpose. However to set encoding to the JVM instance, one can use System.setProperty("file.encoding, "UTF-16"")
.
Here is a useful article with more details.
Upvotes: 1
Reputation: 100151
The in-memory data types for text in java, char, Character, and String, are UTF-16. Absolutely. Always. Unconditionally.
The only thing you can change is how Java converts from bytes-on-the-outside to chars-on-the-inside. There is no way to change the representation to UTF-8 to trade space for time.
Upvotes: 4