Alex Schneider
Alex Schneider

Reputation: 355

How convert to utf8 string in c++

I have a string output that ins not necessarily valid utf8. I have to pass it to a method only accepting valid utf8 strings.
Therefore I need to convert output to the closest valid utf8 string removing invalid bytes or parts. How can I do that in c++? I would like not to use a 3rd party library.

Upvotes: 4

Views: 4885

Answers (2)

Tino Didriksen
Tino Didriksen

Reputation: 2255

If you're sure your string is valid UTF-8 with only a few corrupt bytes, http://utfcpp.sourceforge.net/ can fix that. From the page:

#include "utf8.h"
void fix_utf8_string(std::string& str) {
    std::string temp;
    utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp));
    str = temp;
}

Your requirement for not using a 3rd party library is pretty much impossible when dealing with Unicode data, but the UTF8-CPP library is header-only which is as light as you can get.

Upvotes: 0

dsgriffin
dsgriffin

Reputation: 68616

You should use the icu::UnicodeString methods fromUTF8(const StringPiece &utf8) or toUTF8String(StringClass &result).

Upvotes: 2

Related Questions