Gokul
Gokul

Reputation: 456

why is this error with bad utf8 character is caused while creating a document in couchdb?

Creating document in couchdb is generating the following error,

12> ADoc.
[{<<"Adress">>,<<"Hjalmar Brantingsgatan 7 C">>},
 {<<"District">>,<<"Brämaregården">>},
 {<<"Rent">>,3964},
 {<<"Rooms">>,2},
 {<<"Area">>,0}]
13> IDoc.
[{<<"Adress">>,<<"Segeparksgatan 2A">>},
 {<<"District">>,<<"Kirseberg">>},
 {<<"Rent">>,9701},
 {<<"Rooms">>,3},
 {<<"Area">>,83}]
14> erlang_couchdb:create_document({"127.0.0.1", 5984}, "proto_v1", IDoc).
{json,{struct,[{<<"ok">>,true},
           {<<"id">>,<<"c6d96b5f923f50bfb9263638d4167b1e">>},
           {<<"rev">>,<<"1-0d17a3416d50129328f632fd5cfa1d90">>}]}}
15> erlang_couchdb:create_document({"127.0.0.1", 5984}, "proto_v1", ADoc).
** exception exit: {ucs,{bad_utf8_character_code}}
     in function  xmerl_ucs:from_utf8/1 (xmerl_ucs.erl, line 185)
     in call from mochijson2:json_encode_string/2 (/Users/admin/AlphaGroup/src/mochijson2.erl, line 200)
 in call from mochijson2:'-json_encode_proplist/2-fun-0-'/3 (/Users/admin/AlphaGroup/src/mochijson2.erl, line 181)
 in call from lists:foldl/3 (lists.erl, line 1197)
 in call from mochijson2:json_encode_proplist/2 (/Users/admin/AlphaGroup/src/mochijson2.erl, line 184)
 in call from erlang_couchdb:create_document/3 (/Users/admin/AlphaGroup/src/erlang_couchdb.erl, line 256)

Above of two documents one can be created in couchdb with no problem (IDoc).

can any one help me to figure out the reason it is caused?

Upvotes: 1

Views: 1086

Answers (2)

Chen Yu
Chen Yu

Reputation: 4077

I think that is problem is in the <<"Brämaregården">>. It is necessary to convert the unicode to binary firstly. Example is in the following links.

unicode discussion. The core function is in unicode

Upvotes: 2

legoscia
legoscia

Reputation: 41527

Entering non-ASCII characters in Erlang code is fiddly, not the least because it works differently in the shell than in compiled Erlang code.

Try inputting the binary explicitly as UTF-8:

<<"Br", 16#c3, 16#a4, "mareg", 16#c3, 16#a5, "rden">>

That is, "ä" is represented by the bytes C3 A4 in UTF-8, and "å" by C3 A5. There are many ways to find those codes; a quick search turned up this table.

Normally you'd get the input from somewhere outside your code, e.g. reading from a file, typed into a web form etc, and then you wouldn't have this problem.

Upvotes: 0

Related Questions