Check if UTF-8 string is valid in modern C++

Question

It is known that the standard library of C++11 allows to easily convert a string from UTF-8 encoding to UTF-16. However, the following code successfully converts invalid UTF-8 input (at least under MSVC2010):

#include 
#include 
#include 

int main() {
    std::string input = "\xEA\x8E\x97" "\xE0\xA8\x81" "\xED\xAE\x8D";
    std::wstring_convert, char16_t> converter;
    try {
        std::u16string output = converter.from_bytes(input.data());
        printf("Converted successfully
");
    }
    catch(std::exception &e) {
        printf("Error: %s
", e.what());
    }
}

The string here contains 9 bytes, 3 code points. The last code point is 0xDB8D, which is invalid (fits into the range of surrogates).

Is it possible to check UTF-8 string for perfect validity using only standard library of modern C++? Here I mean that all the invalid cases as described in wikipedia article are not allowed.

Check if UTF-8 string is valid in modern C++

Answers (1)

Related Questions