Reputation: 1011
How can I get the length of string that also contains character references? I want to count only the number of characters which will be displayed in the browser. Like
$raw = "Stack�f9" = Length = 6
$raw = "Stack12345" = Length = 10
$raw = "Stack�f9�f9" = Length = 7
Thanks in advance
Upvotes: 0
Views: 1335
Reputation: 300825
As your strings contain literal encodings of unicode chars (rather than being, say, UTF-8 encoded) you could obtain the length by simply replacing them with a dummy char, thus:
$length=strlen(preg_replace('/&#[0-9a-f]{4}/', '_', $raw));
If they were encoded with something PHP understands, like UTF-8, you could use mb_strlen()
intead.
Upvotes: 2
Reputation: 655169
strlen
is a single-byte string function that fails on mutli-byte strings as it only returns the number of bytes rather than the number of characters (since in single-byte strings every byte represents one character).
For multi-byte strings use strlen
’s multi-byte counterpart mb_strlen
instead and don’t forget to specify the proper character encoding.
And to have HTML character references being interpreted as a single character, use html_entity_decode
to replace them by the characters they represent:
$str = html_entity_decode('Stackù', ENT_QUOTES, 'UTF-8');
var_dump(mb_strlen($str, 'UTF-8')); // int(6)
Note that �f9
is not a valid character reference as it’s missing a x
or X
after &#
for the hexadecimal notation and a ;
after the hexadecimal value.
Upvotes: 1
Reputation: 24577
I would go with:
$len = mb_strlen(html_entities_decode($myString, ENT_QUOTES, 'UTF-8'),'UTF-8');
Although I would first question why you have HTML entities inside your strings, as opposed to manipulating actual UTF-8 encoded strings.
Also, be careful in that your HTML entities are not correctly written (they need to end with a semicolon). If you do not add the semicolon, any entity-related functions will fail, and many browsers will fail to render your entities correctly.
Upvotes: 3