Reputation: 71
My program get an input chinese string in utf32 encoding:
./myprogram 我想玩
I want to convert this to utf8, for this I am using library UTF8-CPP http://utfcpp.sourceforge.net
#include "source/utf8.h"
using namespace std;
int main(int argc, char** argv)
{
printf("argv[1] = %s \n", argv[1]);
string str = argv[1];
printf("str = %s \n", str);
vector<unsigned char> utf8result;
utf8::utf32to8(str.begin(), str.end(), back_inserter(utf8result));
I got the next output in terminal:
argv[1] = 系
str = D�k�
terminate called after throwing an instance of 'utf8::invalid_code_point'
what(): Invalid code point
How to fix this code, so the conversation utf32to8 will be successfull? What am I doing wrong, please, explain me ? After that I want to write received utf8 to file.
Upvotes: 0
Views: 1980
Reputation: 48635
The command on most Linux
distributions passes in UTF-8
in, so you need to convert it to UTF-32
when you receive it and then convert it back when you print it out.
Or you could create a UTF-32
string in the program eg. std::u32string u32s = U"我想玩";
#include "source/utf8.h"
int main()
{
std::u32string u32s = U"我想玩";
std::string u8s;
utf8::utf32to8(u32s.begin(), u32s.end(), std::back_inserter(u8s));
std::cout << u8s << '\n';
}
Note:
From C++11
onwards you don't need to use 3rd party UTF
libraries, the Standard Library has its own, although they are not easy to use.
You can write nicer functions to wrap them like in this answer here:
Any good solutions for C++ string code point and code unit?
Upvotes: 1
Reputation: 37578
Most likely argv[1]
is already stored with Utf-8 encoding. Because this is default way to handle Unicode in Linux. Note that utf32 characters can not be properly represented by std::string
or by C-style null-terminated array of char
because every item occupies 4 bytes.
Upvotes: 0