mike
mike

Reputation: 23

Regex to match a string that may contain Chinese characters

I'm trying to write a regular expression which could match a string that possibly includes Chinese characters. Examples:

hahdj5454_fd.fgg"
example.com/list.php?keyword=关键字
example.com/list.php?keyword=php

I am using this expression:

$matchStr =  '/^[a-z 0-9~%.:_\-\/[^x7f-xff]+$/i';
$str      =  "http://example.com/list.php?keyword=关键字";

if ( ! preg_match($matchStr, $str)){
    exit('WRONG');
}else{
    echo "RIGHT"; 
}

It matches plain English strings like that dasdsdsfds or http://example.com/list.php, but it doesn't match strings containing Chinese characters. How can I resolve this?

Upvotes: 2

Views: 7227

Answers (2)

Pedro Lobito
Pedro Lobito

Reputation: 98921

This works:

$str = "http://mysite/list.php?keyword=关键字";

if (preg_match('/[\p{Han}]/simu', $str)) {
    echo "Contains Chinese Characters"; 
}else{
    exit('WRONG'); // Doesn't contains Chinese Characters
}

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336208

Assuming you want to extend the set of letters that this regex matches from ASCII to all Unicode letters, then you can use

$matchStr =  '#^[\pL 0-9~%.:_/-]+$#u';

I've removed the [^x7f-xff part which didn't make any sense (in your regex, it would have matched an opening bracket, a caret, and some ASCII characters that were already covered by the a-z and 0-9 parts of that character class).

Upvotes: 2

Related Questions