Reputation: 33243
I have a list of dictionaries data_dump
which contains dictionaries like:
d = {"ids": s_id, "subject": subject}
I'm following the tutorial trying to do a bulk insert:
connection = Connection(host,port)
db = connection['clusters']
posts = db.posts
posts.insert(data_dump)
Which fails with the following error:
File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 312, in insert
continue_on_error, self.__uuid_subtype), safe)
bson.errors.InvalidStringData: strings in documents must be valid UTF-8
Please advise. Thanks
Upvotes: 3
Views: 8255
Reputation: 10526
I couldn't afford to lose the non utf-8 characters. So I chose to convert the string to Binary, instead.
As per your example,
>>> print subject
u'Math'
>>> d = {"ids": s_id, "subject": bson.Binary(str(subject))} # convert subject from unicode to Binary
You can't run full-text searches, which is the latest feature in Mongo, but it works well for everything else.
Upvotes: 0
Reputation: 33243
Solved: Well.. forced the encoding by 1) Stripping the string of symbols etc and then 2) converting ascii to utf-8 by raw.decode('ascii') and then decoded_string.encode('utf8') Thanks guys.. :)
Upvotes: 3