c00000fd
c00000fd

Reputation: 22265

How to detect if string contains any supplementary characters in PHP?

From what I understand so far, supplementary characters (or "surrogate pairs") are defined in a range from 0xd800 to 0xdbff for the first char, and from 0xdc00 and 0xdfff for the second char.

So I'm trying to detect if an arbitrary string contains any such characters:

function isSupplementaryCharacter($c1, $c2)
{
    return $c1 >= 0xd800 && $c1 <= 0xdbff && $c2 >= 0xdc00 && $c2 <= 0xdfff;
}

function isStringWithSupplementaryCharacters($str)
{
    $ln = strlen($str);

    for($i = 0; $i < $ln - 1; $i++)
    {
        if(isSupplementaryCharacter(ord($str[$i]), ord($str[$i + 1])))
            return true;
    }

    return false;
}

But that doesn't seem to detect them. For instance:

isStringWithSupplementaryCharacters("=😍!");

returns false.

So to test it, I wrote a small web page to see what codes those symbols become:

$txt = isset($_REQUEST['txt']) ? $_REQUEST['txt'] : '';
$htmTxt = htmlentities($txt);

$hex = '';
$ln = strlen($txt);
for($i = 0; $i < $ln; $i++)
{
    $hex .= dechex(ord($txt[$i])).", ";
}

$htmHex = htmlentities($hex);

echo <<<UUU01
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>

<form method="get">
<input type="text" name="txt"></input>
<input type="submit" value="Go"/>
</form>

<p>$htmTxt</p>
<p>$htmHex</p>

</body>
</html>
UUU01;

But the encoding I'm getting for 😍 is not what I expected:

enter image description here

Why is it giving me f0, 9f, 98, 8d for it? Those don't fall within the definition above. So what am I doing wrong here?

Upvotes: 0

Views: 198

Answers (0)

Related Questions