Reputation: 311
I'm taking foreign (Japanese) characters from a database and using substr() to limit the length of the string.
However when I do this it cuts off a character from the string and that leaves behind one of those question marks in black diamonds as a replacement character (�)
Everything (Documents, Charset, table encoding) are set to UTF-8.
Here is an example of what happens
$string = "日本最大級のポータルサイト。"
echo substr($string, 0,10);
Which outputs 日本最�
How do you reccomend I find/replace this question mark icon?
Upvotes: 3
Views: 1558
Reputation: 37365
You can not use substr()
when dealing with UTF-strigs since each symbol there will be represented as multiple bytes, not single byte (for non-ASCII characters). And substr()
works with bytes. Instead you should use mb_substr()
which will safely and correct return desired result.
To work with multibyte strings in PHP there is mbstring
extension, and mb_substr()
is part of it.
Upvotes: 5
Reputation: 1061
You should use mb_substr() so long as it is enabled on your server.
http://php.net/manual/en/book.mbstring.php
Upvotes: 0