Reputation: 397
I have the byte position of a character in an utf-8 string (got it via preg_match and PREG_OFFSET_CAPTURE). But I need the character position. How can I get it?
I have something like this:
$x = 'öüä nice world';
preg_match('/nice/u', $x, $m, PREG_OFFSET_CAPTURE);
var_dump($m);
which results in:
array(1) {
[0]=>
array(2) {
[0]=>
string(4) "nice"
[1]=>
int(7)
}
}
So I have the byte position which is 7.
But I need the character position which is 4. Is there a way to convert the byte position to the character position?
This example is highly simplified. It's not an option for me to just use mb_strpos
or such things to find the position of the word "nice". I need the regular expression and actually I need preg_match_all
instead of preg_match
. So I think to convert the position would be the best way for me.
Upvotes: 1
Views: 317
Reputation: 47219
As mentioned you could build upon one of the examples from a similar question:
$x = 'öüä nice öüä nice öüä nice öüä nice öüä nice';
$r = preg_match_all('/nice/u', $x, $m, PREG_OFFSET_CAPTURE);
for($i = 0; $i < $r; $i++) {
var_dump(mb_strlen(substr($x, 0, $m[0][$i][1])));
}
Result:
int(4)
int(13)
int(22)
int(31)
int(40)
This shows each character position at which "nice" would immediately follow...
Upvotes: 1