Blankman
Blankman

Reputation: 267380

Memory wise, is storing a string as byte cheaper than its UTF equivalent?

If I store a string as a byte, does it use less memory than if it was stored in UTF-8?

e.g.

string text = "Hello, World!";

Versus encoding it into a byte variable?

Upvotes: 3

Views: 1242

Answers (4)

KeithS
KeithS

Reputation: 71591

Strings are arrays of characters, which in .NET are UTF-16 encoded. Each char thus needs an Int16 (twice the space) to store its value (characters in the upper half of the codepage use a second Char structure to hold the second pair of bytes).

If you're only dealing with ASCII, yes, you can put a string in a byte array that takes half the space as a char array and doesn't lose information. However, as Jon said, that's not a very convenient way to work with strings. You have 2 GIGABYTES of addressing space available for a single string. As bytes, yes you'd get 2 billion characters, but as strings you still get 1 BILLION characters in a single string. If you really need more than that in a single string I worry about what you think you need it for.

Upvotes: 0

BrokenGlass
BrokenGlass

Reputation: 161012

In the example you gave, UTF-8 encoding would save you some bytes insce you only use ASCII characters, but it does depend on the input string - some UTF8 encoded strings might actually be larger than the corresponding UTF-16 version.

//UTF-16 so 26 bytes
string text = "Hello, World!";

//UTF-8 length will be 13 (only ASCII chars used)
var bytesUTF8 = Encoding.UTF8.GetBytes(text);

//UTF-16 so 26 bytes
var bytesUTF16 = Encoding.Unicode.GetBytes(text);

Upvotes: 1

jishi
jishi

Reputation: 24634

UTF8 will only use 1 byte per char if you stick to 7bit ascii.

But internally .NET uses UCS-2 which uses 2 bytes per char IIRC, so yes, assuming you want to store it as UTF8 it will use less memory than just storing it as a string, assuming that you are storing western european languages (aka, latin1).

Upvotes: 3

Jon Skeet
Jon Skeet

Reputation: 1504122

If you stored that in a byte array it would be more efficient than in a string, yes - because all of that text is ASCII, which would be encoded as a single byte per character. However, it's not universally true for all strings (some characters would take 2 bytes, some would take 3 - and for non-BMP characters it would take even more), and it's also a darned sight less convenient to work with in binary form...

I would stick with strings unless you had a really really good reason to keep them in memory as byte arrays.

Upvotes: 3

Related Questions