jakub_jo
jakub_jo

Reputation: 1634

PHP Unicode codepoint to character

I would like to convert Unicode codepoint to character. Here is what I have tried:

$point = dechex(127468);  // 1f1ec

echo "\u{1f1ec}";         // this works
echo "\u{$point}";        // this outputs '\u1f1ec'
echo "\u{{$point}}";      // Parse error: Invalid UTF-8 codepoint escape sequence
echo "\u\{{$point}\}";    // outputs \u\{1f1ec\}
echo "\u{". $point ."}";  // Parse error; same as above

Upvotes: 6

Views: 2503

Answers (5)

Sammitch
Sammitch

Reputation: 32272

For PHP>=7.2

$point = 127468;

var_dump( mb_chr($point) );

Output:

string(4) "🇬"

Ref: https://www.php.net/manual/en/function.mb-chr.php

Upvotes: 1

AmigoJack
AmigoJack

Reputation: 6174

For those not having IntlChar or Intl available one can avoid the evil eval() function by going the SGML route, where entity parsing is available since PHP 4 already. For this one has to know the notation for entities in HTML or XML and then use either the earlier available function mb_decode_numericentity() or the more known function html_entity_decode(). Both support any entity notation - either decimal or hexadecimal:

<?php

    $iPoint= 127468;  // Decimal
    $sPoint= dechex( $iPoint );  // Hexadecimal = '1f1ec'

    // Available since PHP 4.0.6
    $aMap= array( 0, 0x10ffff );
    $sOut1= mb_decode_numericentity( '&#x'. $sPoint. ';', $aMap, 'UTF-8' );  // &#x1f1ec;
    $sOut2= mb_decode_numericentity( '&#'.  $iPoint. ';', $aMap, 'UTF-8' );  // &#127468;

    // Available since PHP 4.3.0
    $sOut3= html_entity_decode     ( '&#x'. $sPoint. ';', 0,     'UTF-8' );  // &#x1f1ec;
    $sOut4= html_entity_decode     ( '&#'.  $iPoint. ';', 0,     'UTF-8' );  // &#127468;

    echo "$sOut1 $sOut2 $sOut3 $sOut4";

This is a bit of an overkill, tho. String concatenation won't work since at any time you already produce a literal, and literals cannot be changed.


A different approach is to use the fact that UTF-32 always consists of 4 bytes for each character, so one can use the hexadecimal code point and turn its textual representation into its binary form (using pack()) to then convert from that into UTF-8 (using mb_convert_encoding()):

<?php

    $iPoint= 127468;  // Decimal
    $sPoint= dechex( $iPoint );  // Hexadecimal = '1f1ec'

    // UTF-32 always uses 4 bytes per character, so the hex value ('1f1ec') must have leading zeroes ('0001f1ec').
    while( strlen( $sPoint )< 8 ) {
        $sPoint= '0'. $sPoint;
    }

    // Convert hex string of 8 bytes into its binary counterpart of 4 bytes.
    $sBinary= pack( 'H*', $sPoint );

    // Just convert 1 character of 4 bytes (UTF-32) into UTF-8 (which are 4 different bytes for this character).
    echo mb_convert_encoding( $sBinary, 'UTF-8', 'UTF-32' );  // From: 0x00 01 f1 ec   To: 0xf0 9f 87 ac
Unicode code point character UTF-8 name
U+1F1EC 🇬 f0 9f 87 ac REGIONAL INDICATOR SYMBOL LETTER G

Upvotes: 0

Alexander Korostin
Alexander Korostin

Reputation: 768

PHP 7+ solution snippet:

function charFromCodePoint($codepoint) {
    eval('$ch = "\u{'.dechex($codepoint).'}";');
    return $ch;
}

Notice, that PHP5 doesn't support the "\u{}" syntax.

Upvotes: -2

inguin
inguin

Reputation: 1

Actually find the solution after several hours:

$unicode = '1F605'; //😅
$uni = '{' . $unicode; // First bracket needs to be separated, otherwise you get '\u1F605'

$str = "\u$uni}";

eval("\$str = \"$str\";"); // Turns unicode into RegEx and store it as $str
echo $str;

Thanks @Rick James for the idea with the eval() function

Upvotes: -2

Aniket Sahrawat
Aniket Sahrawat

Reputation: 12937

You don't need to convert integer to hexadecimal string, instead use IntlChar::chr:

echo IntlChar::chr(127468);

Directly from docs of IntlChar::chr:

Return Unicode character by code point value

Upvotes: 7

Related Questions