Ashley
Ashley

Reputation: 659

Default Encoding and changes

By default, Character and String use UTF-16, however, for all practical purposes, in North America and most of the english locales, UTF-8 is sufficient (since it can go upto 4 bytes). So, if I use a InputStreamReader(InputStream), then does it give me default UTF-16 char encoding? Using a InputStreamReader(InputStream, "UTF-8") would provide a UTF-8 encoding, which would suffice my purpose.

How can I auto-set my JVM's default encoding to UTF-8 while using English locale? The intention is to improve performance for Character and String manipulation (by using 8-bit scheme instead of 16-bit encoding and most ASCII is covered using 8-bit encoding and at the same time complying with Unicode standard).

Any comments are appreciated. Thanks!

Upvotes: 5

Views: 1451

Answers (2)

Sage
Sage

Reputation: 15418

So, if I use a InputStreamReader(InputStream), then does it give me default UTF-16 char encoding? Using a InputStreamReader(InputStream, "UTF-8") would provide a UTF-8 encoding, which would suffice my purpose.

How can I auto-set my JVM's default encoding to UTF-8 while using English locale?

From InputstreamReader java DOC:

The charset that InputStreamReader uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.

like when i try to print in my platform using reader.getEncoding(); it prints UTF-8. Java gets character encoding by calling System.getProperty("file.encoding") at the time of JVM start-up. So if Java doesn't get any file.encoding attribute it uses "UTF-8" character encoding for all practical purpose. However to set encoding to the JVM instance, one can use System.setProperty("file.encoding, "UTF-16"").

Here is a useful article with more details.

Upvotes: 1

bmargulies
bmargulies

Reputation: 100151

The in-memory data types for text in java, char, Character, and String, are UTF-16. Absolutely. Always. Unconditionally.

The only thing you can change is how Java converts from bytes-on-the-outside to chars-on-the-inside. There is no way to change the representation to UTF-8 to trade space for time.

Upvotes: 4

Related Questions