frazman
frazman

Reputation: 33243

Encoding issue inserting into MongoDB with Python

I have a list of dictionaries data_dump which contains dictionaries like:

d = {"ids": s_id, "subject": subject}

I'm following the tutorial trying to do a bulk insert:

connection = Connection(host,port)
db = connection['clusters']
posts = db.posts
posts.insert(data_dump)

Which fails with the following error:

 File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 312, in insert
continue_on_error, self.__uuid_subtype), safe)
bson.errors.InvalidStringData: strings in documents must be valid UTF-8

Please advise. Thanks

Upvotes: 3

Views: 8255

Answers (2)

Anuj Gupta
Anuj Gupta

Reputation: 10526

I couldn't afford to lose the non utf-8 characters. So I chose to convert the string to Binary, instead.

As per your example,

>>> print subject
u'Math'
>>> d = {"ids": s_id, "subject": bson.Binary(str(subject))} # convert subject from unicode to Binary

You can't run full-text searches, which is the latest feature in Mongo, but it works well for everything else.

Upvotes: 0

frazman
frazman

Reputation: 33243

Solved: Well.. forced the encoding by 1) Stripping the string of symbols etc and then 2) converting ascii to utf-8 by raw.decode('ascii') and then decoded_string.encode('utf8') Thanks guys.. :)

Upvotes: 3

Related Questions