Reputation: 125
$sRangeRegex = '/^(.{0,30})?$/';
$value='12345678901234567890123456789ä';
if (!preg_match($sRangeRegex, $value)) {
alert('not match');
}
When i run this code it returns 'not match' alert message. But actually it shouldn't be. Because actual length of value should be 30 (number of characters in the $value) but it shows 31 These umlaut characters are creating problem while matching. So i want solution to solve this problem and with regex only. Thanks.
Upvotes: 1
Views: 99
Reputation: 627607
It is already common knowledge here on SO that in order to work with Unicode strings the PHP regex engine should get a pattern with /u
flag. It is a less well-known fact that in order to match a Unicode grapheme one needs to use \X
shorthand class (PCRE-compliant).
So, to apply some length restriction on a Unicode string pattern, use \X
instead of .
:
$pattern = '/^\X{0,30}$/u';
Note that this regex will match strings that contain 0 to 30 Unicode graphemes. You do not need any (...)?
optional patterns, since 0
in the limiting quantifier already does this job.
However, to check the real length of the Unicode string, you need to use mb_strlen
. See this post of mine for an example.
See this demo:
$pattern = '/^.{0,30}$/u';
$value='12345678901234567890123456789Å';
if (!preg_match($pattern, $value)) {
echo "not match\n";
}
else echo "match!\n";
$pattern = '/^\X{0,30}$/u';
$value='12345678901234567890123456789Å';
if (!preg_match($pattern, $value)) {
echo 'not match';
}
else echo "match!";
Results:
not match (this is the regex with a dot)
match! (the regex based on \X)
Upvotes: 3
Reputation: 42984
You need to tell your regex engine that it should work in utf mode by using the u
flag as modifier:
<?php
$pattern = '/^(.{0,30})?$/u';
$subject='12345678901234567890123456789ä';
if (!preg_match($pattern, $subject, $tokens)) {
alert('not match');
}
var_dump($tokens);
Note the trailing u
inside the pattern definition.
The output is:
array(2) {
[0] =>
string(31) "12345678901234567890123456789ä"
[1] =>
string(31) "12345678901234567890123456789ä"
}
Upvotes: 0