Reputation: 23
I'm trying to write a regular expression which could match a string that possibly includes Chinese characters. Examples:
hahdj5454_fd.fgg"
example.com/list.php?keyword=关键字
example.com/list.php?keyword=php
I am using this expression:
$matchStr = '/^[a-z 0-9~%.:_\-\/[^x7f-xff]+$/i';
$str = "http://example.com/list.php?keyword=关键字";
if ( ! preg_match($matchStr, $str)){
exit('WRONG');
}else{
echo "RIGHT";
}
It matches plain English strings like that dasdsdsfds
or http://example.com/list.php
, but it doesn't match strings containing Chinese characters. How can I resolve this?
Upvotes: 2
Views: 7227
Reputation: 98921
This works:
$str = "http://mysite/list.php?keyword=关键字";
if (preg_match('/[\p{Han}]/simu', $str)) {
echo "Contains Chinese Characters";
}else{
exit('WRONG'); // Doesn't contains Chinese Characters
}
Upvotes: 0
Reputation: 336208
Assuming you want to extend the set of letters that this regex matches from ASCII to all Unicode letters, then you can use
$matchStr = '#^[\pL 0-9~%.:_/-]+$#u';
I've removed the [^x7f-xff
part which didn't make any sense (in your regex, it would have matched an opening bracket, a caret, and some ASCII characters that were already covered by the a-z
and 0-9
parts of that character class).
Upvotes: 2