Steohawk
Steohawk

Reputation: 95

How to get Python 3 to correctly handle unicode strings from MongoDB?

I'm using Windows 7 64-bit, Python 3, MongoDB, and PyMongo. I know that in Python 3, all strings are unicode. I also know that MongoDB stores all strings as unicode. So I don't understand why, when I pull a document from my database where the value of a particular field is "C:\Some Folder\E=mc².xyz", Python treats that string as "C:\Some Folder\E=mc².xyz". It doesn't just print that way; os.path.exists() returns False. Now, as if that wasn't confusing enough, if I save the string to a text file, and then open it with the encoding explicitly set to "utf-8", the string appears correctly, and os.path.exists() returns True. What's going wrong, and how do I fix it?

Edit: Here's some code I just wrote to demonstrate my problem:

from pymongo import MongoClient

db = MongoClient().test_db
orig_doc = {'string': 'E=mc²'}
_id = db.test_col.insert(orig_doc)
new_doc = db.test_col.find_one(_id)
print(new_doc['string'])

>>> E=mc²

As you can see, it works exactly as it should! Thus I now realize that I must've messed up when I migrated from PostgreSQL. Now I just need to fix the strings. I know that it's possible, but there's got to be a better way than writing the strings to a text file and then reading them back. I could do that, just as I did in my previous testing, but it just doesn't seem like the right way.

Upvotes: 0

Views: 2713

Answers (1)

Mark Tolonen
Mark Tolonen

Reputation: 177674

You can't store Unicode. It is a concept. MongoDB has to be using an encoding of Unicode, and it looks like UTF-8. Python 3 Unicode strings are stored internally as one of a number of encodings depending on the content of the string. What you have is a string decoded to Unicode with the wrong encoding:

>>> s='"C:\Some Folder\E=mc².xyz"'  # The invalid decoding.
>>> print(s)
"C:\Some Folder\E=mc².xyz"
>>> print(s.encode('latin1').decode('utf8'))  # Undo the wrong decoding, and apply the right one.
"C:\Some Folder\E=mc².xyz"

There's not enough information to tell you how to read MondoDB correctly, but this should help you along.

Upvotes: 1

Related Questions