Reputation: 3280
I have a string ex:
$a = 'abc🔹abc';
The 'small blue diamond' is: bin2hex('🔹') => f09f94b9
Small blue diamond representation
So, I would like to convert the $a string into a string which represents the small blue diamond with the HTML-escape: 🔹
What would be the function what I should call to convert all unicode character into the HTML-escape representation?
In WordPress when I want to insert the $a variable into a table, $wpdb does it checks. Link to WPDB source code
When WordPress prepares the $data
which should be inserted or updated, it runs the fields on the $wpdb->strip_invalid_text
method and then it check if anything invalid found in the $data
. It the text in the $a variable invalid with the following regexp:
$regex = '/
(
(?: [\x00-\x7F] # single-byte sequences 0xxxxxxx
| [\xC2-\xDF][\x80-\xBF] # double-byte sequences 110xxxxx 10xxxxxx
| \xE0[\xA0-\xBF][\x80-\xBF] # triple-byte sequences 1110xxxx 10xxxxxx * 2
| [\xE1-\xEC][\x80-\xBF]{2}
| \xED[\x80-\x9F][\x80-\xBF]
| [\xEE-\xEF][\x80-\xBF]{2}';
if ( 'utf8mb4' === $charset ) {
$regex .= '
| \xF0[\x90-\xBF][\x80-\xBF]{2} # four-byte sequences 11110xxx 10xxxxxx * 3
| [\xF1-\xF3][\x80-\xBF]{3}
| \xF4[\x80-\x8F][\x80-\xBF]{2}
';
}
$regex .= '){1,40} # ...one or more times
)
| . # anything else
/x';
$value['value'] = preg_replace( $regex, '$1', $value['value'] );
if ( false !== $length && mb_strlen( $value['value'], 'UTF-8' ) > $length ) {
$value['value'] = mb_substr( $value['value'], 0, $length, 'UTF-8' );
}
When the 'small blue diamond' represented with f09f94b9
, this regexp marks the data invalid. When it is represented with 🔹
. So what I need is to convert that unicode characters into a representation what is accepted by WordPress.
Upvotes: 4
Views: 1011
Reputation: 8583
Here is what I came up with to convert all of the characters you can modify it further to only convert characters in the range you need.
$s = 'abc🔹def';
$a = preg_split('//u', $s, null, PREG_SPLIT_NO_EMPTY);
foreach($a as $c){
echo '&#' . unpack('V', iconv('UTF-8', 'UCS-4LE', $c))[1] . ';';
}
Upvotes: 3