daisy
daisy

Reputation: 23511

Rapidjson does not encode utf8 sequence at all

I'm trying to use rapidjson to escape utf8 sequences to \uXXXX format, but it's not working.

StringBuffer s;
Writer<StringBuffer, Document::EncodingType, ASCII<> > writer(s);
writer.StartObject();
writer.String("chinese");
writer.String("中文测试");
writer.EndObject();
cout << s.GetString() << endl;

The document says it would be escaped but actually it's all erased.

I tried to use AutoUTF template, but here's no document for memory stream either

Any ideas? I tried jsoncpp as well, but that library does not support unicode as well

Thanks @Milo Yip, I forget to mention I'm using Visual Studio 2010

Upvotes: 1

Views: 2924

Answers (1)

Milo Yip
Milo Yip

Reputation: 5072

I tried on OS X and it works:

{"chinese":"\u4E2D\u6587\u6D4B\u8BD5"}

I think the problem is that, the compiler you are using does not encode the literal string "中文测试" into UTF-8. Linux/OSX treats source code as UTF-8 but Windows does not by default.

You can try to use C++11 UTF-8 literal u8"中文测试". Or you can read the strings from a UTF-8 encoded file for testing.


The question was updated. On Visual Studio 2010 and after, there is an undocumented feature:

#pragma execution_character_set("utf-8")

Such that the literal characters are encoded in UTF-8. For C++11 compatible compilers, u8"xxx" literal should be used.

Anyway, the claim that "Rapidjson does not encode utf8 sequence at all" is incorrect.

Upvotes: 2

Related Questions