mcmillab
mcmillab

Reputation: 2804

why doesn't byte[] to string and back work as expected

I have this code:

Int32 i1 = 14000000;
byte[] b = BitConverter.GetBytes(i1);
string s = System.Text.Encoding.UTF8.GetString(b);
byte[] b2 = System.Text.Encoding.UTF8.GetBytes(s);
Int32 i2 = BitConverter.ToInt32(b2,0);;

i2 is equal to -272777233. Why isn't it the input value? (14000000) ?

EDIT: what I am trying to do is append it to another string which I'm then writing to file using WriteAllText

Upvotes: 11

Views: 4274

Answers (5)

TFleschenberg
TFleschenberg

Reputation: 51

To make a long story short:

You need a encoding that maps each bytevalue to a unique char and vice versa. A UTF8 Character can be from 1 to 4 bytes long so you wont archive that mapping, you need a more basic encoding like ASCII. Unfortunaly the original ASCII doesnt do that, it is just a 7-bit encoding and only defines the lower 128 Codes, the upper half (extended codes) is codepage specific. To get the full range translation, you just need a complete 8-bit encoding like in codepage 437 or 850 or whatever:

Int32 i1 = 14000000;
byte[] b = BitConverter.GetBytes(i1);
string s = System.Text.Encoding.GetEncoding(437).GetString(b);
byte[] b2 = System.Text.Encoding.GetEncoding(437).GetBytes(s);
Int32 i2 = BitConverter.ToInt32(b2,0);

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1499770

You shouldn't use Encoding.GetString to convert arbitrary binary data into a string. That method is only intended for text that has been encoded to binary data using a specific encoding.

Instead, you want to use a text representation which is capable of representing arbitrary binary data reversibly. The two most common ways of doing that are base64 and hex. Base64 is the simplest in .NET:

string base64 = Convert.ToBase64String(originalBytes);
...
byte[] recoveredBytes = Convert.FromBase64String(base64);

A few caveats to this:

  • If you want to use this string as a URL parameter, you should use a web-safe version of base64; I don't know of direct support for that in .NET, but you can probably find solutions easily enough
  • You should only do this at all if you really need the data in string format. If you're just trying to write it to a file or similar, it's simplest to keep it as binary data
  • Base64 isn't very human-readable; use hex if you want humans to be able to read the data in its text form without extra tooling. (There are various questions specifically about converting binary data to hex and back.)

Upvotes: 12

Alvin Wong
Alvin Wong

Reputation: 12410

Because an Encoding class is not going to just work for anything. If a "character" (possibly a few bytes in case of UTF-8) is not a valid character in that particular character set (in your case UTF-8), it will use a replacement character.

a single QUESTION MARK (U+003F)

(Source: http://msdn.microsoft.com/en-us/library/ms404377.aspx#FallbackStrategy)

Some case it is just a ?, for example in ASCII/CP437/ISO 8859-1, but there is a way for you to choose what to do with it. (See the link above)

For example if you try to convert (byte)128 to ASCII:

string s = System.Text.Encoding.ASCII.GetString(new byte[] { 48, 128 }); // s = "0?"

Then convert it back:

byte[] b = System.Text.Encoding.ASCII.GetBytes(s); // b = new byte[] { 48, 63 }

You will not get the original byte array.

This can be a reference: Check if character exists in encoding


I can't imagine why you would need to convert a byte array to a string. It obviously doesn't make any sense. Let's say you're going to write to a stream, you could just directly write byte[]. If you need to use it in some text representation, it makes perfect sense to just convert it to a string by yourIntegerVar.ToString() and use int.TryParse to get it back.


Edit:

You can write a byte array to a file, but you are not going to "concatenate" the byte array to a string and use the lazy method File.WriteAllText because it is going to handle the encoding conversion and you will probably end up having question marks ? all over your file. Instead, Open a FileStream and use FileStream.Write to directly write the byte array. Alternatively, you can use a BinaryWriter to directly write an integer in its binary form (and also a string) and use its counterpart BinaryReader to read it back.

Example:

FileStream fs;

fs = File.OpenWrite(@"C:\blah.dat");
BinaryWriter bw = new BinaryWriter(fs, Encoding.UTF8);
bw.Write((int)12345678);
bw.Write("This is a string in UTF-8 :)"); // Note that the binaryWriter also prefix the string with its length...
bw.Close();

fs = File.OpenRead(@"C:\blah.dat");
BinaryReader br = new BinaryReader(fs, Encoding.UTF8);
int myInt = br.ReadInt32();
string blah = br.ReadString(); // ...so that it can read it back.
br.Close();

This example code will result in a file which matches the following hexdump:

00  4e 61 bc 00 1c 54 68 69 73 20 69 73 20 61 20 73  Na¼..This is a s  
10  74 72 69 6e 67 20 69 6e 20 55 54 46 2d 38 20 3a  tring in UTF-8 :  
20  29                                               )   

Note that BinaryWriter.Write(string) also prefix the string with its length and it depends on it when reading back, so it is not appropriate to use a text editor to edit the resulting file. (Well you are writing an integer in its binary form so I expect this is acceptable?)

Upvotes: 15

Guffa
Guffa

Reputation: 700152

It's not working because you are using encoding backwards.

Encoding is used to turn text into bytes, and then back into text again. You can't take any arbitrary bytes and turn into text. Every character has a corresponding byte pattern, but every byte pattern doesn't translate into a character.

If you want a compact way to represent bytes as text, use base-64 encoding:

Int32 i1 = 14000000;
byte[] b = BitConverter.GetBytes(i1);
string s = Convert.ToBase64String(b);

byte[] b2 = Convert.FromBase64String(s);
Int32 i2 = BitConverter.ToInt32(b2, 0);

Upvotes: 5

James
James

Reputation: 82096

If your goal here is to store an integer as a string then back to an integer, unless I am missing something wouldn't the following suffice:

int32 i1 = 1400000;
string s = il.ToString();
Int32 i2 = Int32.Parse(s);

Upvotes: 3

Related Questions