Nick
Nick

Reputation: 10499

UTF8Encoding string to byte[] conversion unexpected behavior

I have this piece of code:

byte[] bytes = ...

// Here my bytes.Lenght is 181 (for example)

var str = UTF8Encoding.UTF8.GetString(bytes);
bytes = UTF8Encoding.UTF8.GetBytes(str);

// Here my bytes.Lenght is 189

Why?
How can I convert correctly the string to byte[]?

Edit: An example

public class Person 
{
    public string Name { get; set; }
    public uint Age { get; set; }
}

...

Person p = new Person { Name = "Mary", Age = 24 };

string str;
byte[] b1, b2;

using (var stream = new MemoryStream())
{
    new BinaryFormatter().Serialize(stream, p);
    b1 = stream.ToArray();
    str = UTF8Encoding.UTF8.GetString(b1);
}

b2 = UTF8Encoding.UTF8.GetBytes(str);

Upvotes: 1

Views: 1185

Answers (3)

Henk Holterman
Henk Holterman

Reputation: 273169

// Here my bytes.Lenght is 181 (for example)    
// Here my bytes.Lenght is 189

That can happen.

How can I convert correctly the string to byte[] ?

A difference in size does not mean the conversion is invalid. The initial sequence might have been though.

If you want to preserve the size, use ASCII encoding.


After the expanding edit:

new BinaryFormatter().Serialize(stream, p);
b1 = stream.ToArray();
str = UTF8Encoding.UTF8.GetString(b1);
b2 = UTF8Encoding.UTF8.GetBytes(str);

You make the assumption that a BinaryFormatter will apply UTF8 encoding to strings.
It probably does not. It will add extra data (markers and size fields) to the stream.

So your 2 conversion (Serialize and GetString ) are just not compatible.

Aside from a difference in size, when you display the result it will probably contain some 'strange' characters.


Second Edit:

When I deserialize the new byte array (b2) it trows an Exception

Right. What you actually need is Convert.ToBase64String(), not UTF8.GetString()

Base64 strings can be stored and transported as strings and then converted back to byte[] again.

Upvotes: 2

L.B
L.B

Reputation: 116098

Don't try to convert binary data to string with UTF8.GetString(or any encoding). Use Convert.ToBase64String and Convert.FromBase64String instead

Upvotes: 1

Rawling
Rawling

Reputation: 50104

If you want to serialize an arbitrary byte[] to and from a string, don't use UTF8 encoding, use Base64.

Upvotes: 1

Related Questions