Vcvv
Vcvv

Reputation: 71

During conversation from UTF32 to UTF8 using UTF8-CPP I get an error "utf8::invalid_code_point"

My program get an input chinese string in utf32 encoding:

./myprogram 我想玩 

I want to convert this to utf8, for this I am using library UTF8-CPP http://utfcpp.sourceforge.net

#include "source/utf8.h"
using namespace std;
int main(int argc, char** argv)
{
    printf("argv[1] = %s \n", argv[1]);
    string str = argv[1];
    printf("str = %s \n", str);

    vector<unsigned char> utf8result;
    utf8::utf32to8(str.begin(), str.end(), back_inserter(utf8result));

I got the next output in terminal:

argv[1] = 系 
str =  D�k� 
terminate called after throwing an instance of 'utf8::invalid_code_point'
  what():  Invalid code point

How to fix this code, so the conversation utf32to8 will be successfull? What am I doing wrong, please, explain me ? After that I want to write received utf8 to file.

Upvotes: 0

Views: 1980

Answers (2)

Galik
Galik

Reputation: 48635

The command on most Linux distributions passes in UTF-8 in, so you need to convert it to UTF-32 when you receive it and then convert it back when you print it out.

Or you could create a UTF-32 string in the program eg. std::u32string u32s = U"我想玩";

#include "source/utf8.h"

int main()
{
    std::u32string u32s = U"我想玩";

    std::string u8s;
    utf8::utf32to8(u32s.begin(), u32s.end(), std::back_inserter(u8s));

    std::cout << u8s << '\n';
}

Note:

From C++11 onwards you don't need to use 3rd party UTF libraries, the Standard Library has its own, although they are not easy to use.

You can write nicer functions to wrap them like in this answer here:

Any good solutions for C++ string code point and code unit?

Upvotes: 1

user7860670
user7860670

Reputation: 37578

Most likely argv[1] is already stored with Utf-8 encoding. Because this is default way to handle Unicode in Linux. Note that utf32 characters can not be properly represented by std::string or by C-style null-terminated array of char because every item occupies 4 bytes.

Upvotes: 0

Related Questions