Reputation: 355
I have a string output that ins not necessarily valid utf8. I have to pass it to a method only accepting valid utf8 strings.
Therefore I need to convert output to the closest valid utf8 string removing invalid bytes or parts. How can I do that in c++? I would like not to use a 3rd party library.
Upvotes: 4
Views: 4885
Reputation: 2255
If you're sure your string is valid UTF-8 with only a few corrupt bytes, http://utfcpp.sourceforge.net/ can fix that. From the page:
#include "utf8.h"
void fix_utf8_string(std::string& str) {
std::string temp;
utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp));
str = temp;
}
Your requirement for not using a 3rd party library is pretty much impossible when dealing with Unicode data, but the UTF8-CPP library is header-only which is as light as you can get.
Upvotes: 0
Reputation: 68616
You should use the icu::UnicodeString
methods fromUTF8(const StringPiece &utf8)
or toUTF8String(StringClass &result).
Upvotes: 2