Eric McLachlan
Eric McLachlan

Reputation: 3530

How to validate the decoding of a bytearray without raising an exception?

Is there a way to try to decode a bytearray without raising an error if the encoding fails?

EDIT: The solution needn't use bytearray.decode(...). Anything library (preferably standard) that does the job would be great.

Note: I don't want to ignore errors, (which I could do using bytearray.decode(errors='ignore')). I also don't want an exception to be raised. Preferably, I would like the function to return None, for example.

my_bytearray = bytearray('', encoding='utf-8')

# ...
# Read some stream of bytes into my_bytearray.
# ...

text = my_bytearray.decode()

If my_bytearray doesn't contain valid UTF-8 text, the last line will raise an error.

Question: Is there a way to perform the validation but without raising an error?

(I realize that raising an error is considered "pythonic". Let's assume this is undesirable for some or other good reason.)

I don't want to use a try-catch block because this code gets called thousands of times and I don't want my IDE to stop every time this exception is raised (whereas I do want it to pause on other errors).

Upvotes: 1

Views: 732

Answers (2)

Eric McLachlan
Eric McLachlan

Reputation: 3530

The chardet module can be used to detect the encoding of a bytearray before calling bytearray.decode(...).

The Code:

import chardet
identity = chardet.detect(my_bytearray)

The method chardet.detect(...) returns a dictionary with the following format:

{
  'confidence': 0.99,
  'encoding': 'ascii',
  'language': ''
}

One could check analysis['encoding'] to confirm that my_bytearray is compatible with an expected set of text encoding before calling my_bytearray.decode().

One consideration of using this approach is that the encoding indicated by the analysis might indicate one of a number of equivalent encodings. In this case, for instance, the analysis indicates that the encoding is ASCII whereas it could equivalently be UTF-8.

(Credit to @simon who pointed this out on StackOverflow here.)

Upvotes: 1

snakecharmerb
snakecharmerb

Reputation: 55800

You could use the suppress context manager to suppress the exception and have slightly prettier code than with try/except/pass:

import contextlib
...
return_val = None
with contextlib.suppress(UnicodeDecodeError):
    return_val = my_bytearray.decode('utf-8')

Upvotes: 6

Related Questions