Reputation: 593
For some strange reason my python code stopped working after I switched from ubuntu 12 to ubuntu 14. I can't unpickle my data any more. I stored the data in a couchdb database by converting to latin1 encoding.
I'm using latin1 because I read some time ago (I don't have the link any more) that it is the only encoding I can use to store and retrieve cPickled binary data from a couchdb database. It was meant to avoid encoding issues with json (couchdbkit uses json in background).
Latin1 was supposed to map 256 characters to 256 characters, which would be exactly byte by byte. Now, after system upgrade, python seems to complain as if there were only 128 valid values and throws UnicodeDecodeError (see below)
old couchdbkit was 0.5.7
new python version is 2.7.6
Not sure you need all those details, but here are some declarations I use:
#deals with all the errors when saving an item
def saveitem(item):
item.set_db(self.db)
item["_id"] = key
error = True
while error:
try:
item.save()
error = False
except ResourceConflict:
try:
item = DBEntry.get_or_create(key)
except ResourceConflict:
pass
except (NoMoreData) as e:
print "CouchDB.set.saveitem: NoMoreData error, retrying...", str(e)
except (RequestError) as e:
print "CouchDB.set.saveitem: RequestError error. retrying...", str(e)
#deals with most of what could go wrong when adding an attachment
def addattachment(item, content, name = "theattachment"):
key = item["_id"]
error = True
while error:
try:
item.put_attachment(content = content, name = name) #, content_type = "application/octet-stream"
error = False
except ResourceConflict:
try:
item = DBEntry.get_or_create(key)
except ResourceConflict:
print "addattachment ResourceConflict, retrying..."
except NoMoreData:
print "addattachment NoMoreData, retrying..."
except (NoMoreData) as e:
print key, ": no more data exception, wating 1 sec and retrying... -> ", str(e)
time.sleep(1)
item = DBEntry.get_or_create(key)
except (IOError) as e:
print "addattachment IOError:", str(e), "repeating..."
item = DBEntry.get_or_create(key)
except (KeyError) as e:
print "addattachment error:", str(e), "repeating..."
try:
item = DBEntry.get_or_create(key)
except ResourceConflict:
pass
except (NoMoreData) as e:
pass
Then I save as follows:
pickled = cPickle.dumps(obj = value, protocol = 2)
pickled = pickled.decode('latin1')
item = DBEntry(content={"seeattachment": True, "ispickled" : True},
creationtm=datetime.datetime.utcnow(),lastaccesstm=datetime.datetime.utcnow())
item = saveitem(item)
addattachment(item, pickled)
And here is how I unpack. Data was written under ubuntu 12. Fails to unpack under ubuntu 14:
def unpackValue(self, value, therawkey):
if value is None: return None
originalval = value
value = value["content"]
result = None
if value.has_key("realcontent"):
result = value["realcontent"]
elif value.has_key("seeattachment"):
if originalval.has_key("_attachments"):
if originalval["_attachments"].has_key("theattachment"):
if originalval["_attachments"]["theattachment"].has_key("data"):
result = originalval["_attachments"]["theattachment"]["data"]
result = base64.b64decode(result)
else:
print "unpackvalue: no data in attachment. Here is how it looks like:"
print originalval["_attachments"]["theattachment"].iteritems()
else:
error = True
while error:
try:
result = self.db.fetch_attachment(therawkey, "theattachment")
error = False
except ResourceConflict:
print "could not get attachment for", therawkey, "retrying..."
time.sleep(1)
except ResourceNotFound:
self.delete(key = therawkey, rawkey = True)
return None
if value["ispickled"]:
result = cPickle.loads(result.encode('latin1'))
else:
result = value
if isinstance(result, unicode): result = result.encode("utf8")
return result
The line result = cPickle.loads(result.encode('latin1'))
succeeds under ubuntu 12 but it fails under ubuntu 14. Following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
I did NOT get that error under ubuntu 12!
How can I read my data under ubuntu 14 while keeping the newer couchdbkit and python versions? Is that even a versioning problem? Why is that error happening?
Upvotes: 0
Views: 512
Reputation: 879691
It appears that there is some change -- possibly in couchdbkit's API -- which
makes result
a UTF-8 encoded str
whereas before it was unicode
.
Since you want to encode the unicode
in latin1
, the work-around is to use
cPickle.loads(result.decode('utf8').encode('latin1'))
Note that it would be better to find where result
is getting UTF-8 encoded and
either preventing that from happening (so you still have unicode
as you did
under Ubuntu 12) or changing the encoding to latin1
so that result
will
already be in the form you desire.
Upvotes: 1