Reputation: 4929
I'm trying to understand how an array of bytes size is smaller than a strings. I know each character of a string is like 2 bytes or something. But even that math is not adding up. Can someone shed some light for me please?
The following:
byte[] myBytes = Encoding.ASCII.GetBytes("12345");
string myString = Convert.ToBase64String(myBytes);
Debug.WriteLine("Size of byte array: " + myBytes.Length);
Debug.WriteLine("Size of string: " + myString.Length);
Returns:
Size of byte array: 5
Size of string: 8
Upvotes: 4
Views: 4793
Reputation: 61339
The sizes/lengths do match, but only if you use a 1:1 encoding.
First, you seem to be a bit confused as to what encoding is. Remember that bytes are just numbers (ranged 0-127) and are the only thing storable by a computer. Those numbers don't mean anything to humans other than numeric value. Because we wanted to be able to store the idea of text, we had to come up with a way to map these numbers to readable (and some not so readable) characters. These methods are called encodings.
You encoded your bytes with Base64 encoding, which has overhead (approximately 1 extra byte per 3 bytes of input according to Base64 length calculation?). That overhead is causing your difference.
If you used Encoding.ASCII
instead:
byte[] myBytes = Encoding.ASCII.GetBytes("12345");
string myString = Encoding.ASCII.GetString(myBytes);
Console.WriteLine("Size of byte array: " + myBytes.Length);
Console.WriteLine("Size of string: " + myString.Length);
You get as expected:
Size of byte array: 5
Size of string: 5
The reason to use Base64 (even with overhead) is that it can encode any byte array into printable characters (which is required when trying to send them say, via a URL), whereas ASCII encoding will result in unprintable characters for quite a few values.
Also note that a character is only two bytes in a UTF-16 encoding, which is why your number isn't double like you mentioned in the question.
Upvotes: 19