roast_soul
roast_soul

Reputation: 3650

A weird thing in c# Encoding

I convert a byte array to a string , and I convert this string to byte array. these two byte arrays are different.

As below:

byte[] tmp = Encoding.ASCII.GetBytes(Encoding.ASCII.GetString(b));

Suppose b is a byte array.

b[0]=3, b[1]=188, b[2]=2 //decimal system

Result:

tmp[0]=3, tmp[1]=63, tmp[2]=2

So that's my problem, what's wrong with it?

Upvotes: 4

Views: 2295

Answers (6)

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239764

Not every sequence of bytes is necessarily a valid sequence of encoded values for a particular encoding.

So the result of Encoding.ASCII.GetString(b) on an arbitrary array of bytes, b, is poorly defined. (And could be, for any other encoding also).

If you need to take an arbitrary byte array and obtain a sequence of characters, you might want to look into the Convert classes ToBase64String and FromBase64String. If that's not what you're trying to do, maybe explain the original problem to us.

Upvotes: 1

Omar
Omar

Reputation: 16623

188 isn't in the range of ASCII (7 bit), you should use Encoding.Default to get the ANSI encoding:

byte[] b = new byte[3]{ 3, 188, 2 };
byte[] tmp = Encoding.Default.GetBytes(Encoding.Default.GetString(b));

Upvotes: 0

Alvin Wong
Alvin Wong

Reputation: 12430

ASCII is 7-bit only, so others are invalid. By default it uses ? to replace any invalid bytes and that's why you get a ?.

For 8-bit character sets, you should be looking for either the Extended ASCII (which is later defined "ISO 8859-1") or the code page 437 (which is often confused with Extended ASCII, but in fact it's not).

You can use the following code:

Encoding enc = Encoding.GetEncoding("iso-8859-1");
// For CP437, use Encoding.GetEncoding(437)
byte[] tmp = enc.GetBytes(enc.GetString(b));

Upvotes: 4

GvS
GvS

Reputation: 52528

The ASCII character set has a range from 1 to 127. You can see 188 is not in this range and is converted to ? (= ASC 63).

Upvotes: 1

O. R. Mapper
O. R. Mapper

Reputation: 20760

The character 188 is not defined for ASCII. Instead, you're getting 63, which is a question mark.

Upvotes: 1

Rowland Shaw
Rowland Shaw

Reputation: 38130

188 is out of range for ASCII. Characters that are not in the corresponding character set are transposed to '?' by design (would you prefer transposing to "1/4"?)

Upvotes: 5

Related Questions