Reputation: 159
I need to get only 30 characters from the paragraph submitted by user. In case the 30th character is an emoji, the output shows question marks. How can I avoid breaking the emojis?
echo substr("Hello world Hello world Hell😄 ", 0, 30);
Output: Hello world Hello world Hell��
Also, when using json_encode to return the output, the output is blank.
$myvariable = array();
$myvariable['hello'] = substr("Hello world Hello world Hell😄 ", 0, 30);
echo json_encode($myvariable);
Upvotes: 3
Views: 1484
Reputation: 3780
I think the simplest solution would be to use mb_substr
Performs a multi-byte safe substr() operation based on number of characters.
php > $myvariable = array();
php > $myvariable['hello'] = mb_substr("Hello world Hello world Hell😄 ", 0, 30);
php > var_dump($myvariable);
array(1) {
["hello"]=>
string(33) "Hello world Hello world Hell😄 "
}
php > echo json_encode($myvariable);
{"hello":"Hello world Hello world Hell\ud83d\ude04 "}
php >
Upvotes: 8
Reputation: 579
$first = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
$char = current($m);
$utf = iconv('UTF-8', 'UCS-4', $char);
return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $string);
Output
string 'Français' (length=13)
OR
echo json_decode('"\uD83D\uDE00"');
Upvotes: 0
Reputation: 579
<meta charset="ISO-8859-1">
OR
function entities( $string ) {
$stringBuilder = "";
$offset = 0;
if ( empty( $string ) ) {
return "";
}
while ( $offset >= 0 ) {
$decValue = ordutf8( $string, $offset );
$char = unichr($decValue);
$htmlEntited = htmlentities( $char );
if( $char != $htmlEntited ){
$stringBuilder .= $htmlEntited;
} elseif( $decValue >= 128 ){
$stringBuilder .= "&#" . $decValue . ";";
} else {
$stringBuilder .= $char;
}
}
return $stringBuilder;
}
// source - http://php.net/manual/en/function.ord.php#109812
function ordutf8($string, &$offset) {
$code = ord(substr($string, $offset,1));
if ($code >= 128) { //otherwise 0xxxxxxx
if ($code < 224) $bytesnumber = 2; //110xxxxx
else if ($code < 240) $bytesnumber = 3; //1110xxxx
else if ($code < 248) $bytesnumber = 4; //11110xxx
$codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
for ($i = 2; $i <= $bytesnumber; $i++) {
$offset ++;
$code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx
$codetemp = $codetemp*64 + $code2;
}
$code = $codetemp;
}
$offset += 1;
if ($offset >= strlen($string)) $offset = -1;
return $code;
}
// source - http://php.net/manual/en/function.chr.php#88611
function unichr($u) {
return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
}
/* ---- */
var_dump( entities( "&" ) ) . "\n";
var_dump( entities( "<" ) ) . "\n";
var_dump( entities( "😎" ) ) . "\n";
var_dump( entities( "☚" ) ) . "\n";
var_dump( entities( "" ) ) . "\n";
var_dump( entities( "A" ) ) . "\n";
var_dump( entities( "Hello 😎 world" ) ) . "\n";
var_dump( entities( "this & that 😎" ) ) . "\n";
Upvotes: 0