Hamed
Hamed

Reputation: 137

ASCII Code Of Characters

In C# I need to get the ASCII code of some characters. So I convert the char To byte Or int, then print the result.

String sample="A";
int AsciiInt = sample[0];
byte AsciiByte = (byte)sample[0];

For characters with ASCII code 128 and less, I get the right answer.
But for characters greater than 128 I get irrelevant answers!

I am sure all characters are less than 0xFF.

Also I have Tested System.Text.Encoding and got the same results.

For example: I get 172 For a char with actual byte value of 129!

Actually ASCII characters Like ƒ , ‡ , ‹ , “ , ¥ , © , Ï , ³ , · , ½ , » , Á
Each character takes 1 byte and goes up to more than 193. I Guess There is An Unicode Equivalent for Them and .Net Return That Because Interprets Strings As Unicode!
What If SomeOne Needs To Access The Actual Value of a byte , Whether It is a valid Known ASCII Character Or Not!!!

Upvotes: 1

Views: 1869

Answers (2)

Luaan
Luaan

Reputation: 63732

You can't just ignore the issue of encoding. There is no inherent mapping between bytes and characters - that's defined by the encoding.

If I use your example of 131, on my system, this produces â. However, since you're obviously on an arabic system, you most likely have Windows-1256 encoding, which produces ƒ for 131.

In other words, if you need to use the correct encoding when converting characters to bytes and vice versa. In your case,

var sample = "ƒ";
var byteValue = Encoding.GetEncoding("windows-1256").GetBytes(sample)[0];

Which produces 131, as you seem to expect. Most importantly, this will work on all computers - if you want to have this system locale-specific, Encoding.Default can also work for you.

The only reason your method seems to work for bytes under 128 is that in UTF-8, the characters correspond to the ASCII standard mapping. However, you're misusing the term ASCII - it really only refers to these 7-bit characters. What you're calling ASCII is actually an extended 8-bit charset - all characters with the 8-bit set are charset-dependent.

We're no longer in a world when you can assume your application will only run on computers with the same locale you have - .NET is designed for this, which is why all strings are unicode. At the very least, read this http://www.joelonsoftware.com/articles/Unicode.html for an explanation of how encodings work, and to get rid of some of the serious and dangerous misconceptions you seem to have.

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1500525

But For Characters Upper Than 128 I get Irrelevant answers

No you don't. You get the bottom 8 bits of the UTF-16 code unit corresponding to the char.

Now if your text were all ASCII, that would be fine - because ASCII only goes up to 127 anyway. It sounds like you're actually expecting the representation in some other encoding - so you need to work out which encoding that is, at which point you can use:

Encoding encoding = ...;
byte[] bytes = encoding.GetBytes(sample);
// Now extract the bytes you want. Note that a character may be represented by more than
// one byte.

If you're essentially looking for an encoding which treats bytes 0 to 255 respectively as U+0000 to U+00FF respectively, you should use ISO-8859-1, which you can access using Encoding.GetEncoding(28591).

Upvotes: 4

Related Questions