Reputation: 137
In C# I need to get the ASCII code of some characters.
So I convert the char
To byte
Or int
, then print the result.
String sample="A";
int AsciiInt = sample[0];
byte AsciiByte = (byte)sample[0];
For characters with ASCII code 128 and less, I get the right answer.
But for characters greater than 128 I get irrelevant answers!
I am sure all characters are less than 0xFF.
Also I have Tested System.Text.Encoding
and got the same results.
For example: I get 172 For a char with actual byte value of 129!
Actually ASCII characters Like ƒ , ‡ , ‹ , “ , ¥ , © , Ï , ³ , · , ½ , » , Á
Each character takes 1 byte and goes up to more than 193.
I Guess There is An Unicode Equivalent for Them and .Net Return That Because Interprets Strings As Unicode!
What If SomeOne Needs To Access The Actual Value of a byte , Whether It is a valid Known ASCII Character Or Not!!!
Upvotes: 1
Views: 1869
Reputation: 63732
You can't just ignore the issue of encoding. There is no inherent mapping between bytes and characters - that's defined by the encoding.
If I use your example of 131, on my system, this produces â
. However, since you're obviously on an arabic system, you most likely have Windows-1256 encoding, which produces ƒ
for 131.
In other words, if you need to use the correct encoding when converting characters to bytes and vice versa. In your case,
var sample = "ƒ";
var byteValue = Encoding.GetEncoding("windows-1256").GetBytes(sample)[0];
Which produces 131, as you seem to expect. Most importantly, this will work on all computers - if you want to have this system locale-specific, Encoding.Default
can also work for you.
The only reason your method seems to work for bytes under 128 is that in UTF-8, the characters correspond to the ASCII standard mapping. However, you're misusing the term ASCII - it really only refers to these 7-bit characters. What you're calling ASCII is actually an extended 8-bit charset - all characters with the 8-bit set are charset-dependent.
We're no longer in a world when you can assume your application will only run on computers with the same locale you have - .NET is designed for this, which is why all strings are unicode. At the very least, read this http://www.joelonsoftware.com/articles/Unicode.html for an explanation of how encodings work, and to get rid of some of the serious and dangerous misconceptions you seem to have.
Upvotes: 1
Reputation: 1500525
But For Characters Upper Than 128 I get Irrelevant answers
No you don't. You get the bottom 8 bits of the UTF-16 code unit corresponding to the char
.
Now if your text were all ASCII, that would be fine - because ASCII only goes up to 127 anyway. It sounds like you're actually expecting the representation in some other encoding - so you need to work out which encoding that is, at which point you can use:
Encoding encoding = ...;
byte[] bytes = encoding.GetBytes(sample);
// Now extract the bytes you want. Note that a character may be represented by more than
// one byte.
If you're essentially looking for an encoding which treats bytes 0 to 255 respectively as U+0000 to U+00FF respectively, you should use ISO-8859-1, which you can access using Encoding.GetEncoding(28591)
.
Upvotes: 4