frazman
frazman

Reputation: 33213

How to catch an exception and continue without stopping

I have the following code

import msgpack
def deserialize(data)
   return msgpack.loads(data)

for data in stream:
    print deserialize(data)

Now, the issue is that some data is (maybe) corrupted.. and I am seeing this error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 0: invalid continuation byte

  1. Why am I getting this error?
  2. I want to catch this error and irrespective of this error continue for rest of data.

so, I tried

def deserialize(data)
   try:
        return msgpack.loads(data)
   except UnicodeDecodeError as e:
      logger.error(" error")

But,the code halts with the same error?

Upvotes: 2

Views: 9753

Answers (3)

abarnert
abarnert

Reputation: 365617

Why am I getting this error?

There are a few possible reasons, but they all come down to the same thing: you have something which is not UTF-8, but you're trying to read it as if it were.

First, if you actually have raw binary data, not text, you can't decode it as text. (Before version 2.0 of the protocol, there was no such distinction, so if you're learning from a tutorial or sample code written for an older version, it will be misleading.) The packing side needs to explicitly pack it as binary data. If it's using the same msgpack library, it does so by using the use_bin_type=True argument. If it's using some other library, you'll have to read the docs for that other library.

Second, if you have text, but the packer is packing it as, say, Latin-1, you can't unpack it as UTF-8. For example, that byte 0xcc means Ì in Latin-1 and related character sets, which is a perfectly valid character, but it's an error in UTF-8 unless it follows a prefix byte. (And really, you're lucky you got an error, rather than silently doing the wrong thing and spewing mojibake all over.) Again, if the packing side is using the same msgpack library, it's just a matter of passing encoding='utf8' on both sides. If it's using a different library, you'll have to look at the docs for that other library. (And if you can't change the other side, then you need to figure out what encoding it's using—which may be difficult, because it may depend on the library, or the platform, or even the user's specified locale or system default character set…)


I want to catch this error and irrespective of this error continue for rest of data.

Well, that's probably not what you want to do… but if you do, your existing code should work fine. It's raising a UnicodeDecodeError inside an except UnicodeDecodeError as e:. If that isn't working, most likely your actual code doesn't look like what you've shown us here.

It's worth noting that you seem to be mixing tabs and spaces in your source code, which is going to cause a lot of problems—things that look like they're at the same indentation level to you as a human may look like they're at different indentation levels to the Python compiler, and vice-versa. I can't quite see how that could cause the exact code you posted to fail (except by raising an IndentationError at compile time, which doesn't sound like what you're describing), but it can easily cause problems like this in even slightly more complex code.

Upvotes: 3

Fabricator
Fabricator

Reputation: 12772

You can try the unicode_errors='ignore' option

msgpack.loads(data, unicode_errors='ignore')

Upvotes: 1

Alexander McFarlane
Alexander McFarlane

Reputation: 11293

If you want to just log ' error':

def deserialize(data)
    try:
        return msgpack.loads(data)
    except:
        logger.error(" error")

If you want to record the nature of the error:

def deserialize(data)
    try:
        return msgpack.loads(data)
    except UnicodeDecodeError as e:
        # ... do something with e

If you want to avoid the error, this should decode and replace any errored values, see here for more elegant solutions / info on this error...

data.decode('utf-8', 'replace').encode('utf-8')

Upvotes: 1

Related Questions