Reputation: 22265
From what I understand so far, supplementary characters (or "surrogate pairs") are defined in a range from 0xd800
to 0xdbff
for the first char, and from 0xdc00
and 0xdfff
for the second char.
So I'm trying to detect if an arbitrary string contains any such characters:
function isSupplementaryCharacter($c1, $c2)
{
return $c1 >= 0xd800 && $c1 <= 0xdbff && $c2 >= 0xdc00 && $c2 <= 0xdfff;
}
function isStringWithSupplementaryCharacters($str)
{
$ln = strlen($str);
for($i = 0; $i < $ln - 1; $i++)
{
if(isSupplementaryCharacter(ord($str[$i]), ord($str[$i + 1])))
return true;
}
return false;
}
But that doesn't seem to detect them. For instance:
isStringWithSupplementaryCharacters("=😍!");
returns false
.
So to test it, I wrote a small web page to see what codes those symbols become:
$txt = isset($_REQUEST['txt']) ? $_REQUEST['txt'] : '';
$htmTxt = htmlentities($txt);
$hex = '';
$ln = strlen($txt);
for($i = 0; $i < $ln; $i++)
{
$hex .= dechex(ord($txt[$i])).", ";
}
$htmHex = htmlentities($hex);
echo <<<UUU01
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<form method="get">
<input type="text" name="txt"></input>
<input type="submit" value="Go"/>
</form>
<p>$htmTxt</p>
<p>$htmHex</p>
</body>
</html>
UUU01;
But the encoding I'm getting for 😍
is not what I expected:
Why is it giving me f0, 9f, 98, 8d
for it? Those don't fall within the definition above. So what am I doing wrong here?
Upvotes: 0
Views: 198