Martin Perry
Martin Perry

Reputation: 9527

C++11 - Unicode string - find if contains non-ascii based characters

I have Unicode string in C++11 represented in UTF8 class. I can get Unicode character numbers. How can I check, if the string contains characters, that are not ASCII based (with diacritics)?

Eg. I want to detect Japanese, Arabic, Russian etc., but for special characters like German ü, Czech č, French î etc. I want to said, that it is "ASCII like"

(I dont want to use Boost)

Upvotes: 1

Views: 1156

Answers (1)

Martin Perry
Martin Perry

Reputation: 9527

I have found a solution, using unicode string normalization with use of Unilib.

I am iterating string char by char. My UTF8 string is represented with TinyUTF8 library (which is now maintained on github).

utf8_string u8str = u8"\u4e0a\u6d77 Příliš žluťoučký kůň úpěl ďábelské ódy";

for (auto c : u8str){
    std::u32string uu;
    uu.push_back(c);        
    ufal::unilib::uninorms::nfd(uu);

    if (uu[0] < 128){
       //has ASCII base
    }
}

Upvotes: 2

Related Questions