Reputation: 1634
I would like to convert Unicode codepoint to character. Here is what I have tried:
$point = dechex(127468); // 1f1ec
echo "\u{1f1ec}"; // this works
echo "\u{$point}"; // this outputs '\u1f1ec'
echo "\u{{$point}}"; // Parse error: Invalid UTF-8 codepoint escape sequence
echo "\u\{{$point}\}"; // outputs \u\{1f1ec\}
echo "\u{". $point ."}"; // Parse error; same as above
Upvotes: 6
Views: 2503
Reputation: 32272
For PHP>=7.2
$point = 127468;
var_dump( mb_chr($point) );
Output:
string(4) "🇬"
Ref: https://www.php.net/manual/en/function.mb-chr.php
Upvotes: 1
Reputation: 6174
For those not having IntlChar
or Intl
available one can avoid the evil eval()
function by going the SGML route, where entity parsing is available since PHP 4 already. For this one has to know the notation for entities in HTML or XML and then use either the earlier available function mb_decode_numericentity()
or the more known function html_entity_decode()
. Both support any entity notation - either decimal or hexadecimal:
<?php
$iPoint= 127468; // Decimal
$sPoint= dechex( $iPoint ); // Hexadecimal = '1f1ec'
// Available since PHP 4.0.6
$aMap= array( 0, 0x10ffff );
$sOut1= mb_decode_numericentity( '&#x'. $sPoint. ';', $aMap, 'UTF-8' ); // 🇬
$sOut2= mb_decode_numericentity( '&#'. $iPoint. ';', $aMap, 'UTF-8' ); // 🇬
// Available since PHP 4.3.0
$sOut3= html_entity_decode ( '&#x'. $sPoint. ';', 0, 'UTF-8' ); // 🇬
$sOut4= html_entity_decode ( '&#'. $iPoint. ';', 0, 'UTF-8' ); // 🇬
echo "$sOut1 $sOut2 $sOut3 $sOut4";
This is a bit of an overkill, tho. String concatenation won't work since at any time you already produce a literal, and literals cannot be changed.
A different approach is to use the fact that UTF-32 always consists of 4 bytes for each character, so one can use the hexadecimal code point and turn its textual representation into its binary form (using pack()
) to then convert from that into UTF-8 (using mb_convert_encoding()
):
<?php
$iPoint= 127468; // Decimal
$sPoint= dechex( $iPoint ); // Hexadecimal = '1f1ec'
// UTF-32 always uses 4 bytes per character, so the hex value ('1f1ec') must have leading zeroes ('0001f1ec').
while( strlen( $sPoint )< 8 ) {
$sPoint= '0'. $sPoint;
}
// Convert hex string of 8 bytes into its binary counterpart of 4 bytes.
$sBinary= pack( 'H*', $sPoint );
// Just convert 1 character of 4 bytes (UTF-32) into UTF-8 (which are 4 different bytes for this character).
echo mb_convert_encoding( $sBinary, 'UTF-8', 'UTF-32' ); // From: 0x00 01 f1 ec To: 0xf0 9f 87 ac
Unicode code point character UTF-8 name U+1F1EC 🇬 f0 9f 87 ac REGIONAL INDICATOR SYMBOL LETTER G
Upvotes: 0
Reputation: 768
PHP 7+ solution snippet:
function charFromCodePoint($codepoint) {
eval('$ch = "\u{'.dechex($codepoint).'}";');
return $ch;
}
Notice, that PHP5 doesn't support the "\u{}" syntax.
Upvotes: -2
Reputation: 1
Actually find the solution after several hours:
$unicode = '1F605'; //😅
$uni = '{' . $unicode; // First bracket needs to be separated, otherwise you get '\u1F605'
$str = "\u$uni}";
eval("\$str = \"$str\";"); // Turns unicode into RegEx and store it as $str
echo $str;
Thanks @Rick James for the idea with the eval() function
Upvotes: -2
Reputation: 12937
You don't need to convert integer to hexadecimal string, instead use IntlChar::chr:
echo IntlChar::chr(127468);
Directly from docs of IntlChar::chr
:
Return Unicode character by code point value
Upvotes: 7