magicyang
magicyang

Reputation: 483

How can I match a string with only Chinese letters using a regex?

I want to get a regex which can only match a string consisted of Chinese character and without English or any other character. [\u4e00-\u9fa5] doesn't work at all, and [^x00-xff] would match the situation with punctuate or other language character.

boost::wregex reg(L"\\w*");
bool b = boost::regex_match(L"我a", reg);    // expected to be false
b = boost::regex_match(L"我,", reg);         // expected to be false
b = boost::regex_match(L"我", reg);          // expected to be true

Upvotes: 6

Views: 2089

Answers (2)

magicyang
magicyang

Reputation: 483

The following regex works fine.

boost::wregex reg(L"^[\u4e00-\u9fa5]+");

Upvotes: 1

MSalters
MSalters

Reputation: 179991

Boost with ICU can use character classes. I think you're looking for \p{Han} script. Alternatively, U+4E00..U+9FFF is \p{InCJK_Unified_Ideographs}

Upvotes: 3

Related Questions