Reputation: 1294
Is it possible to input a character and get the unicode value back? for example, i can put ⽇ in html to output "⽇", is it possible to give that character as an argument to a function and get the number as an output without building a unicode table?
$val = someFunction("⽇");//returns 12103
or the reverse?
$val2 = someOtherFunction(12103);//returns "⽇"
I would like to be able to output the actual characters to the page not the codes, and I would also like to be able to get the code from the character if possible. The closest I got to what I want is php.net/manual/en/function.mb-decode-numericentity.php but I cant get it working, is this the code I need or am I on the wrong track?
Upvotes: 36
Views: 36338
Reputation: 9413
You can use the following deprecated functions
For encoding
string utf8_encode ( string $data )
http://php.net/manual/en/function.utf8-encode.php
For decoding
string utf8_decode ( string $data )
http://php.net/manual/en/function.utf8-decode.php
Also check
http://php.net/manual/en/function.htmlspecialchars.php
<?php
echo htmlspecialchars_decode("⽇");//will print ⽇
?>
Upvotes: 3
Reputation: 4298
If you're using PHP7.2 (or later), you don't need to define a new function. There are two functions for your purposes from Multibyte String extension!
To get code point of a character (i.e. Unicode value), use mb_ord(); and to get a specific character from that value, use mb_chr().
E.g.:
mb_chr(12103, "UTF-8"); // ⽇
mb_ord("⽇", "UTF-8"); // 12103
Upvotes: 26
Reputation: 212522
function _uniord($c) {
if (ord($c[0]) >=0 && ord($c[0]) <= 127)
return ord($c[0]);
if (ord($c[0]) >= 192 && ord($c[0]) <= 223)
return (ord($c[0])-192)*64 + (ord($c[1])-128);
if (ord($c[0]) >= 224 && ord($c[0]) <= 239)
return (ord($c[0])-224)*4096 + (ord($c[1])-128)*64 + (ord($c[2])-128);
if (ord($c[0]) >= 240 && ord($c[0]) <= 247)
return (ord($c[0])-240)*262144 + (ord($c[1])-128)*4096 + (ord($c[2])-128)*64 + (ord($c[3])-128);
if (ord($c[0]) >= 248 && ord($c[0]) <= 251)
return (ord($c[0])-248)*16777216 + (ord($c[1])-128)*262144 + (ord($c[2])-128)*4096 + (ord($c[3])-128)*64 + (ord($c[4])-128);
if (ord($c[0]) >= 252 && ord($c[0]) <= 253)
return (ord($c[0])-252)*1073741824 + (ord($c[1])-128)*16777216 + (ord($c[2])-128)*262144 + (ord($c[3])-128)*4096 + (ord($c[4])-128)*64 + (ord($c[5])-128);
if (ord($c[0]) >= 254 && ord($c[0]) <= 255) // error
return FALSE;
return 0;
} // function _uniord()
and
function _unichr($o) {
if (function_exists('mb_convert_encoding')) {
return mb_convert_encoding('&#'.intval($o).';', 'UTF-8', 'HTML-ENTITIES');
} else {
return chr(intval($o));
}
} // function _unichr()
Upvotes: 39
Reputation: 536775
Here's a more compact implementation of unichr/uniord based on pack
:
// code point to UTF-8 string
function unichr($i) {
return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}
// UTF-8 string to code point
function uniord($s) {
return unpack('V', iconv('UTF-8', 'UCS-4LE', $s))[1];
}
Upvotes: 26
Reputation: 847
This also works, (for someone who understands bitshifting this might be more readable than Mark Bakers answer):
public function ordinal($str){
$charString = mb_substr($str, 0, 1, 'utf-8');
$size = strlen($charString);
$ordinal = ord($charString[0]) & (0xFF >> $size);
//Merge other characters into the value
for($i = 1; $i < $size; $i++){
$ordinal = $ordinal << 6 | (ord($charString[$i]) & 127);
}
return $ordinal;
}
Upvotes: 10