Is specifying String encoding when parsing byte[] really necessary?

Question

Supposedly, it is "best practice" to specify the encoding when creating a String from a byte[]:

byte[] b;
String a = new String(b, "UTF-8"); // 100% safe
String b = new String(b); // safe enough

If I know my installation has default encoding of utf8, is it really necessary to specify the encoding to still be "best practice"?

Stephen C · Accepted Answer

If I know my installation has default encoding of utf8, is it really necessary to specify the encoding to still be "best practice"?

But do you know for sure that your installation will always have a default encoding of UTF-8? (Or at least, for as long as your code is used ...)

And do you know for sure that your code is never going to be used in a different installation that has a different default encoding?

If the answer to either of those is "No" (and unless you are prescient, it probably has to be "No") then I think that you should follow best practice ... and specify the encoding if that is what your application semantics requires:

If the requirement is to always encode (or decode) in UTF-8, then use "UTF-8".
If the requirement is to always encode (or decode) in using the platform default, then do that.
If the requirement is to support multiple encodings (or the requirement might change) then make the encoding name a configuration (or command line) parameter, resolve to a Charset object and use that.

The point of this "best practice" recommendation is to avoid a foreseeable problem that will arise if your platform's characteristics change. You don't think that is likely, but you probably can't be completely sure about it. But at the end of the day, it is your decision.

(The fact that you are actually thinking about whether "best practice" is appropriate to your situation is a GOOD THING ... in my opinion.)

Is specifying String encoding when parsing byte[] really necessary?

Answers (2)

Related Questions