Reputation: 483
I want to get a regex which can only match a string consisted of Chinese character and without English or any other character. [\u4e00-\u9fa5] doesn't work at all, and [^x00-xff] would match the situation with punctuate or other language character.
boost::wregex reg(L"\\w*");
bool b = boost::regex_match(L"我a", reg); // expected to be false
b = boost::regex_match(L"我,", reg); // expected to be false
b = boost::regex_match(L"我", reg); // expected to be true
Upvotes: 6
Views: 2089
Reputation: 483
The following regex works fine.
boost::wregex reg(L"^[\u4e00-\u9fa5]+");
Upvotes: 1
Reputation: 179991
Boost with ICU can use character classes. I think you're looking for \p{Han}
script. Alternatively, U+4E00..U+9FFF is \p{InCJK_Unified_Ideographs}
Upvotes: 3