Reputation: 568
I am trying to decode this special character: "ß", if I use "ord()", I get "C3"
echo "ord hex--> " . dechex(ord('ß'));
...but that doesn't look good; so i tried "bin2hex()", now I get "C39F" (what?).
echo "bin2hex --> " . bin2hex('ß');
By using an Extended ASCII Table from the Internet, i know that the correct hexadecimal value is "DF", so i now tried "hex2bin()", but that give me some unknown character like this: "�".
echo "hex2bin --> " . hex2bin('DF');
Is it possible to get the "DF" output?
Upvotes: 0
Views: 1016
Reputation: 7434
ASCII goes from 0x00 to 0x7F. This is not enough to represent all the characters needed so historically old Windows OSes used the available space in a byte (from 0x80 to 0xFF) to represent different characters depending on the localization. This is what codepages are: an arbitrary mapping of non-ASCII values to non-ASCII characters. What you call "extended ASCII" is IMO an inappropriate name for a codepage.
The assumption 1 byte - 1 character is dead and (if not) must die.
So actually what you are seeing is the UTF-8 representation of ß
. If you want to see the UNICODE code point value of ß
(or any other character) just show its UTF-32 representation that AFAIK is mapped 1:1.
// Print 000000df
echo bin2hex(iconv('UTF-8', 'UTF-32BE', 'ß')));
Upvotes: 1
Reputation: 522109
You're on the right path with bin2hex
, what you're confused about is merely the encoding. Currently you're seeing the hex value of ß for the UTF-8 encoding, because your string is encoded in UTF-8. What you want is the hex value for that string in some other encoding. Let's assume "Extended ASCII" refers to ISO-8859-1, as it colloquially often does (but doesn't have to):
echo bin2hex(iconv('UTF-8', 'ISO-8859-1', 'ß'));
Now, having said that, I have no idea what you'd use that information for. There are many valid "hex values" for the character ß in various different encodings; "Extended ASCII" is just one possible answer, and it's a vague answer to be sure, since "Extended ASCII" has very little practical meaning with hundreds of different "Extended ASCII" charsets available.
Upvotes: 1
Reputation: 146460
bin2hex()
should be fine, as long as you know what encoding you are using.
The C3
output you get appears to be the first byte of the two-byte representation of the character in UTF-8 (what incidentally means that you've configured your editor to save files in such encoding, which is a good idea in 2017).
The ord()
function does not accept arbitrary encodings, let alone Unicode-compatible ones such as UTF-8:
Returns the ASCII value of the first character of
string
.
ASCII (a fairly small 7-bit charset) does not have any encoding for the ß
character (aka U+00DF LATIN SMALL LETTER SHARP S). Seriously. ASCII does not even have a DF
position (it goes up to 7E
).
Upvotes: 0