d0n
d0n

Reputation: 65

mb_substr adds 3 dots(...) at the end when stripping UTF-8 text, and doesn't add them when the text is in latin

I have a code like this:

if(strlen($text)>=15)
$text=mb_substr($text, 0, 15, 'UTF-8');

It works as it should, but the thing is, when the text is in Latin(e.g. English), when it strips it down, it does not display 3 dots in the end. On the other hand when the text is in other languages that need UTF-8 encoding it adds 3 dots in the end.

Example:

What are cells made of

gets replaced with

What are cells

On the other hand:

で作られた細胞は何ですか

gets replaced with

で作られた細 ...

What am I missing ?

Upvotes: 4

Views: 847

Answers (1)

Ja͢ck
Ja͢ck

Reputation: 173662

This happens because strlen() returns the length of a string in binary form, i.e. number of octets.

Because utf8 represents ASCII in the same way as e.g. iso-8859-1, there will be no difference between the number of characters and the number of octets. However, each utf8 character may occupy up to three octets for characters outside of ASCII, such as Asian characters.

So, to determine the number of characters correctly you need to use mb_strlen().

Upvotes: 2

Related Questions