Bohemian
Bohemian

Reputation: 425003

Is specifying String encoding when parsing byte[] really necessary?

Supposedly, it is "best practice" to specify the encoding when creating a String from a byte[]:

byte[] b;
String a = new String(b, "UTF-8"); // 100% safe
String b = new String(b); // safe enough

If I know my installation has default encoding of utf8, is it really necessary to specify the encoding to still be "best practice"?

Upvotes: 1

Views: 115

Answers (2)

Stephen C
Stephen C

Reputation: 718788

If I know my installation has default encoding of utf8, is it really necessary to specify the encoding to still be "best practice"?

But do you know for sure that your installation will always have a default encoding of UTF-8? (Or at least, for as long as your code is used ...)

And do you know for sure that your code is never going to be used in a different installation that has a different default encoding?

If the answer to either of those is "No" (and unless you are prescient, it probably has to be "No") then I think that you should follow best practice ... and specify the encoding if that is what your application semantics requires:

  • If the requirement is to always encode (or decode) in UTF-8, then use "UTF-8".

  • If the requirement is to always encode (or decode) in using the platform default, then do that.

  • If the requirement is to support multiple encodings (or the requirement might change) then make the encoding name a configuration (or command line) parameter, resolve to a Charset object and use that.

The point of this "best practice" recommendation is to avoid a foreseeable problem that will arise if your platform's characteristics change. You don't think that is likely, but you probably can't be completely sure about it. But at the end of the day, it is your decision.

(The fact that you are actually thinking about whether "best practice" is appropriate to your situation is a GOOD THING ... in my opinion.)

Upvotes: 1

Henry
Henry

Reputation: 43728

Different use cases have to be distinguished here: If you get the bytes from an external source via some protocol with a specified encoding then always use the first form (with explicit encoding).

If the source of the bytes is the local machine, for example a local text file, the second form (without explicit encoding) is better.

Always keep in mind, that your program may be used on a different machine with a different platform encoding. It should work there without any changes.

Upvotes: 3

Related Questions