Reputation: 9527
I have Unicode string in C++11 represented in UTF8 class. I can get Unicode character numbers. How can I check, if the string contains characters, that are not ASCII based (with diacritics)?
Eg. I want to detect Japanese, Arabic, Russian etc., but for special characters like German ü
, Czech č
, French î
etc. I want to said, that it is "ASCII like"
(I dont want to use Boost)
Upvotes: 1
Views: 1156
Reputation: 9527
I have found a solution, using unicode string normalization with use of Unilib.
I am iterating string char by char. My UTF8 string is represented with TinyUTF8 library (which is now maintained on github).
utf8_string u8str = u8"\u4e0a\u6d77 Příliš žluťoučký kůň úpěl ďábelské ódy";
for (auto c : u8str){
std::u32string uu;
uu.push_back(c);
ufal::unilib::uninorms::nfd(uu);
if (uu[0] < 128){
//has ASCII base
}
}
Upvotes: 2