Reputation: 135
I am trying to write a fully utf-8 compliant application with CouchDB as the back-end. I use c++ with the casablanca rest sdk to send my requests to Couch version 1.6.1. To test that the application can handle various unicode characters I have a teststring in a JSON object that I want to PUT to Couch. The string is formatted as such (c++)
const string_t InternationalText =
L"Hello world!123#@%\n\r\v\t\f Å i åa ä e ö
\u00c5 \u00fc \u03bb \u0416 \u4e16\u754c\u548c\u5e73 \U00013080";
The last character in the string, \U00013080 Eye of Horus, is giving me trouble. I get a 400 bad request from Couch and if I look in the log I see the error "lexical error: invalid character inside string."
I've done some sniffing using RawCap to capture the request - response cycle and the important parts of my request are:
PUT *address*
Content-Type: application/json;charset=utf-8
Body: *Complex Json object containing the string as such*
{"description"="Hello world!123#@% Å i åa ä e ö Å ü λ Ж 世界和平 𓂀",...}
If I look at the hex of the request the Eye of horus character is encoded as F0 93 82 80 which according to https://codepoints.net/U+13080 is correct. Still, I get the UTF-8 error. What am I missing? Does CouchDB have problem dealing with characters from plane 1+ in the unicode standard?
Almost needless to say, everything works fine if I remove the hieroglyph.
Upvotes: 1
Views: 295
Reputation: 135
I found the problem. Turns out \v is an illegal character for JSON, https://www.rfc-editor.org/rfc/rfc7159, and removing that solves my issue. I was thrown by some strange behavior in visual studio's unit test framework that passed the test when I removed the last character in my test string even though there still were errors in the call.
Upvotes: 2