David Klempfner
David Klempfner

Reputation: 9870

Unicode vs. UTF-8

I believe Windows currently defaults to UTF-16 for “Unicode”, but that this may not be the case in the future.

For this reason, would it be better to use

[System.Text.Encoding]::UTF8.GetString($someByteArray)

instead of the following?:

[System.Text.Encoding]::Unicode.GetString($someByteArray)

Upvotes: 2

Views: 1114

Answers (2)

First, yes Windows defaults to UTF-16. Personally I would use UTF-8, because most of the applications I write have to communicate with Linux applications or some form of http so UTF-8 is more likely.

Besides even if all your code is used with Microsoft systems it's easy to convert to UTF-8 and a simple substitute regular expression could change everything over to Unicode (UTF-16) if .NET started requiring it.

Upvotes: 2

bobince
bobince

Reputation: 536359

this may not be the case in the future.

Unicode isn't a potentially-variable encoding; it's just Microsoft's (sadly misleading) name for UTF-16LE.

It isn't going to change. Even if Microsoft moved towards implementing Windows APIs natively in UTF-8 or UTF-32 (something there's no sign of ever happening), System.Text.Encoding.Unicode would have to remain UTF-16LE as that is how it is defined by the .NET specification.

would it be better to use UTF8 instead of Unicode?

Use UTF8 if the byte array contains UTF-8-encoded bytes, and use Unicode if they are in UTF-16LE.

If you get to choose what encoding is used to store data at rest, UTF-8 is usually the better choice for space efficiency reasons.

Upvotes: 7

Related Questions