Reputation: 265
When I use substr()
I get a strange character at the end
$articleText = substr($articleText,0,500);
I have an output of 500 chars and � <--
How can I fix this? Is it an encoding problem? My language is Greek.
Upvotes: 26
Views: 23817
Reputation: 13267
use this function, It worked for me
function substr_unicode($str, $s, $l = null) {
return join("", array_slice(
preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l));
}
Credits: http://php.net/manual/en/function.mb-substr.php#107698
Upvotes: 1
Reputation: 1654
You are trying to cut unicode character.So i preferred instead of substr()
try mb_substr()
in php.
substr()
substr ( string $string , int $start [, int $length ] )
mb_substr()
mb_substr ( string $str , int $start [, int $length [, string $encoding ]] )
For more information for substr() - Credits => Check Here
Upvotes: 0
Reputation: 4094
Alternative solution for UTF-8 encoded strings - this will convert UTF-8 to characters before cutting the sub-string.
$articleText = substr(utf8_decode($articleText),0,500);
To get the articleText string back to UTF-8, an extra operation will be needed:
$articleText = utf8_encode( substr(utf8_decode($articleText),0,500) );
Upvotes: 0
Reputation: 785
ms_substr() also works excellently for removing strange trailing line breaks as well, which I was having trouble with after parsing html code. The problem was NOT handled by:
trim()
or:
var_dump(preg_match('/^\n|\n$/', $variable));
or:
str_replace (array('\r\n', '\n', '\r'), ' ', $text)
Don't catch.
Upvotes: 0
Reputation: 1043
Use mb_substr
instead, it is able to deal with multiple encodings, not only single-byte strings as substr
:
$articleText = mb_substr($articleText,0,500,'UTF-8');
Upvotes: 20
Reputation: 522042
Looks like you're slicing a unicode character in half there. Use mb_substr
instead for unicode-safe string slicing.
Upvotes: 6
Reputation: 400972
substr
is counting using bytes, and not characters.
greek probably means you are using some multi-byte encoding, like UTF-8 -- and counting per bytes is not quite good for those.
Maybe using mb_substr
could help, here : the mb_*
functions have been created specifically for multi-byte encodings.
Upvotes: 61