cronos
cronos

Reputation: 568

How to display the (extended) ASCII representation of a special character in PHP 5.6?

I am trying to decode this special character: "ß", if I use "ord()", I get "C3"

echo "ord hex--> "  . dechex(ord('ß'));

...but that doesn't look good; so i tried "bin2hex()", now I get "C39F" (what?).

echo "bin2hex --> " . bin2hex('ß');

By using an Extended ASCII Table from the Internet, i know that the correct hexadecimal value is "DF", so i now tried "hex2bin()", but that give me some unknown character like this: "�".

echo "hex2bin --> " . hex2bin('DF');

Is it possible to get the "DF" output?

Upvotes: 0

Views: 1016

Answers (3)

ntd
ntd

Reputation: 7434

ASCII goes from 0x00 to 0x7F. This is not enough to represent all the characters needed so historically old Windows OSes used the available space in a byte (from 0x80 to 0xFF) to represent different characters depending on the localization. This is what codepages are: an arbitrary mapping of non-ASCII values to non-ASCII characters. What you call "extended ASCII" is IMO an inappropriate name for a codepage.

The assumption 1 byte - 1 character is dead and (if not) must die.

So actually what you are seeing is the UTF-8 representation of ß. If you want to see the UNICODE code point value of ß (or any other character) just show its UTF-32 representation that AFAIK is mapped 1:1.

// Print 000000df
echo bin2hex(iconv('UTF-8', 'UTF-32BE', 'ß')));                          

Upvotes: 1

deceze
deceze

Reputation: 522109

You're on the right path with bin2hex, what you're confused about is merely the encoding. Currently you're seeing the hex value of ß for the UTF-8 encoding, because your string is encoded in UTF-8. What you want is the hex value for that string in some other encoding. Let's assume "Extended ASCII" refers to ISO-8859-1, as it colloquially often does (but doesn't have to):

echo bin2hex(iconv('UTF-8', 'ISO-8859-1', 'ß'));

Now, having said that, I have no idea what you'd use that information for. There are many valid "hex values" for the character ß in various different encodings; "Extended ASCII" is just one possible answer, and it's a vague answer to be sure, since "Extended ASCII" has very little practical meaning with hundreds of different "Extended ASCII" charsets available.

Upvotes: 1

Álvaro González
Álvaro González

Reputation: 146460

bin2hex() should be fine, as long as you know what encoding you are using.

The C3 output you get appears to be the first byte of the two-byte representation of the character in UTF-8 (what incidentally means that you've configured your editor to save files in such encoding, which is a good idea in 2017).

The ord() function does not accept arbitrary encodings, let alone Unicode-compatible ones such as UTF-8:

Returns the ASCII value of the first character of string.

ASCII (a fairly small 7-bit charset) does not have any encoding for the ß character (aka U+00DF LATIN SMALL LETTER SHARP S). Seriously. ASCII does not even have a DF position (it goes up to 7E).

Upvotes: 0

Related Questions