jean
jean

Reputation: 2980

c++ check utf8 string contain specified characters

Given an utf8 string, how to know it contain specified characters which don't allowed?

The demand is the utf8 string only can contain English characters and Chinese characters. Any other characters like symbols, numbers, white space, '\n' ... are disallowed.

Dose std::regex can do this job?

bool legal(const std::string& s) { // s is utf8 string
   //??
}

Upvotes: 0

Views: 1387

Answers (1)

Rudolfs Bundulis
Rudolfs Bundulis

Reputation: 11934

You could convert the std::string to a vector of utf32 code points (as described here) and then iterate them and check the ranges (however I cannot provide the utf32 value ranges for Chinese letters and judging from the comments on your question that could actually be an issue).

EDIT

As stated in the comment below, if you know that the characters that you need to validate fall in the 2 byte range you could stick with utf16.

Upvotes: 1

Related Questions