Reputation: 1468
Before I used Mongodb 2.0.6, everything is fine.
recently I started to use Mongodb 2.4.8 with Java Play framework, and I found that when I tried to save Chinese to mongodb, mongodb actually stored as some unreadable string, such as &\#21457;&\#29983;
, what is show on web is the same string, does anything know why?
what should I do? how to convert it to readable Chinese?
Upvotes: 5
Views: 3085
Reputation: 9358
While I have no experience with play framework specifically, the general approach to resolve your issue is to try logging/dumping such string right before it's passed to your mongodb driver, if:
the string is still encoded as utf-8, not entity (&#...), you need to check if your mongodb driver for 2.4 is updated with some new options that convert utf-8 into entities.
if the string is already converted to entities, well you at least ruled out mongodb driver and should track down the conversion within play framework instead.
As others have mentioned, mongodb itself does not care if your input are entities or not, as long as they are utf-8 encoded. it's more likely play framework or the mongodb driver is to blame.
PS: I assume unreable
means they were converted to entities (&#...), not encoded incorrectly.
Upvotes: 1
Reputation: 3706
From what you have posted I suspect that this may be an artefact of the Play Framework, as both these characters can be stored directly in MongoDB.
> db.test1.insert({x:"𡑗 and 𩦃"})
> db.test1.find();
{ "_id" : ObjectId("52a12237e7c9d6190f6feb95"), "x" : "𡑗 and 𩦃" }
Assuming that the characters you posted as 发 and 生 above are really meant to be 𡑗 and 𩦃 then I would suspect that the Play Framework is converting them into a representation of their extended unicode values. In this case those two characters would be from the "CJK Unified Ideographs Extension B" section.
You can view the whole set of characters here: http://codepoints.net/cjk_unified_ideographs_extension_b
This looks to be a similar issue as here in the play-framework google group.
Upvotes: 3
Reputation: 6243
I just wrote a quick test and this works just fine.
package com.mongodb;
import com.mongodb.util.TestCase;
import org.junit.Assert;
import org.junit.Test;
public class EncodingTest extends TestCase {
String chinese = "你好";
@Test
public void saveChinese() {
DBCollection collection = getDatabase().getCollection("chinese");
collection.insert(new BasicDBObject().append("message", chinese));
DBObject object = collection.findOne();
Assert.assertEquals(chinese, object.get("message"));
}
}
That text saves and loads without error. It would help to see what code you're using to test.
Upvotes: 2
Reputation: 1339
I think,your string gets converted to unreadable string in between.As I tested this on console and works fine for me.
$ mongo test
MongoDB shell version: 2.4.8
connecting to: test
> var doc = { "message" :"你好" }
> db.ChineseWord.save(doc)
> db.ChineseWord.find().pretty()
{ "_id" : ObjectId("529da2018170273efa43e181"), "message" : "你好" }
Upvotes: 6