Jeutnarg
Jeutnarg

Reputation: 1178

Is there any technical reason not to use StandardCharsets in Java?

As of Java 1.7, StandardCharsets are part of the standard library, but I work with a lot of legacy code which was written well before that was implemented. I have been replacing stuff with StandardCharsets whenever I run across it (primarily to make the code prettier/cleaner), but I have worries about making these changes in areas which have performance-critical sections or that I can't easily debug.

Is there any technical reason for not using Standard Charsets? As in, are there 'gotchas' or inefficiencies that might arise from using StandardCharsets instead of Guava charsets or something like getBytes("UTF-8")? I know that "These charsets are guaranteed to be available on every implementation of the Java platform.", but I don't know if they're slower or have quirks that the older methods don't have.

To try and keep this on-topic, assume that there's no subjective force affecting this like the preference of other developers, resistance to change, etc.

Also, if it affects anything, UTF-8 is the encoding I really care about.

Upvotes: 3

Views: 4109

Answers (3)

Ingo
Ingo

Reputation: 36339

You should use them, if only for the reason that you can't get an UnsupportedCharSetException, which is the case if you use the forName methods and misspell the name.

It is always a good idea to "move" the possibility of an error from runtime to compile time.

Upvotes: 2

Cyäegha
Cyäegha

Reputation: 4251

As in, are there 'gotchas' or inefficiencies that might arise from using StandardCharsets instead of Guava charsets or something like getBytes("UTF-8")?


First of all, java.nio.charset.StandardCharsets.UTF_8 (as implemented in OpenJDK/Oracle JDK), com.google.common.base.Charsets.UTF_8 and org.apache.commons.io.Charsets.UTF_8 are all implemented exactly identically:

public static final Charset UTF_8 = Charset.forName("UTF-8");

So, at least, you don't have to worry about differences with Guava Charsets or with Charset.forName("UTF-8").


As for String.getBytes(String) and String.getBytes(Charset), I do see a difference in the documentation:

  • For String.getBytes(Charset): "This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.".
  • For String.getBytes(String): "The behavior of this method when this string cannot be encoded in the given charset is unspecified.".

So, depending on which JRE you use, I expect there might be a difference in the handling of unmappable characters between someString.getBytes("UTF-8") and someString.getBytes(StandardCharsets.UTF_8).

Upvotes: 4

Enusi
Enusi

Reputation: 101

The best reason to not use StandardCharsets would probably be the use of special characters. Not every character has been available since Java 1 and therefore it's likely that although this is the best for legacy programs, it's not universally accessible and useful to everyone.

Then again, it's probably fine for most people - and I can't imagine any performance issues here resulting.

Upvotes: 0

Related Questions