Reputation: 5655
For my project, everything must be in unicode. Here is my way of handling everything, all strings are passed into this function:
def unicodify(string):
if not isinstance(string, unicode):
return string.decode('utf8', errors='ignore')
return string
Is the following method good practice for production code? If not, why and how would you suggest decoding to unicode? The errors='ignore' actually does not work for ValueErrors 'invalid \x escape', but i'm not sure how to properly handle that.
Thanks
Upvotes: 1
Views: 240
Reputation: 31130
For you to even attempt to convert str type to unicode type you need to know the encoding of the data in str. utf8 is common, but not the only encoding out there.
Additionally, you could get str data that is not in any encoding (like arbitrary binary data). In that case you can not convert it to unicode. Or rather, you have two options: a) raise an exception or b) convert as much as you can and ignore errors. It depends on the application what you should do.
Upvotes: 0
Reputation: 368904
You may have invalid string literal.
\x
should be followed by two hex values(digits, A
, B
, C
, D
, E
, F
, a
, b
, c
, d
, e
, f
).
Valid example:
>>> '\xA9'
'\xa9'
>>> '\x00'
'\x00'
>>> '\xfF'
'\xff'
Invalid example:
>>> '\xOO'
ValueError: invalid \x escape
>>> '\xl3'
ValueError: invalid \x escape
>>> '\x5'
ValueError: invalid \x escape
See String literals.
Upvotes: 1