Reputation: 2980
Given an utf8 string, how to know it contain specified characters which don't allowed?
The demand is the utf8 string only can contain English characters and Chinese characters. Any other characters like symbols, numbers, white space, '\n' ... are disallowed.
Dose std::regex can do this job?
bool legal(const std::string& s) { // s is utf8 string
//??
}
Upvotes: 0
Views: 1387
Reputation: 11934
You could convert the std::string
to a vector of utf32 code points (as described here) and then iterate them and check the ranges (however I cannot provide the utf32 value ranges for Chinese letters and judging from the comments on your question that could actually be an issue).
EDIT
As stated in the comment below, if you know that the characters that you need to validate fall in the 2 byte range you could stick with utf16.
Upvotes: 1