Reputation: 3530
Is there a way to try to decode a bytearray without raising an error if the encoding fails?
EDIT: The solution needn't use bytearray.decode(...). Anything library (preferably standard) that does the job would be great.
Note: I don't want to ignore errors, (which I could do using bytearray.decode(errors='ignore')
). I also don't want an exception to be raised. Preferably, I would like the function to return None, for example.
my_bytearray = bytearray('', encoding='utf-8')
# ...
# Read some stream of bytes into my_bytearray.
# ...
text = my_bytearray.decode()
If my_bytearray doesn't contain valid UTF-8 text, the last line will raise an error.
Question: Is there a way to perform the validation but without raising an error?
(I realize that raising an error is considered "pythonic". Let's assume this is undesirable for some or other good reason.)
I don't want to use a try-catch block because this code gets called thousands of times and I don't want my IDE to stop every time this exception is raised (whereas I do want it to pause on other errors).
Upvotes: 1
Views: 732
Reputation: 3530
The chardet
module can be used to detect the encoding of a bytearray before calling bytearray.decode(...)
.
The Code:
import chardet
identity = chardet.detect(my_bytearray)
The method chardet.detect(...)
returns a dictionary with the following format:
{
'confidence': 0.99,
'encoding': 'ascii',
'language': ''
}
One could check analysis['encoding']
to confirm that my_bytearray
is compatible with an expected set of text encoding before calling my_bytearray.decode()
.
One consideration of using this approach is that the encoding indicated by the analysis might indicate one of a number of equivalent encodings. In this case, for instance, the analysis indicates that the encoding is ASCII whereas it could equivalently be UTF-8.
(Credit to @simon who pointed this out on StackOverflow here.)
Upvotes: 1
Reputation: 55800
You could use the suppress context manager to suppress the exception and have slightly prettier code than with try/except/pass:
import contextlib
...
return_val = None
with contextlib.suppress(UnicodeDecodeError):
return_val = my_bytearray.decode('utf-8')
Upvotes: 6