Reputation: 428
I'm doing this:
word.rstrip(s)
Where word and s are strings containing unicode characters.
I'm getting this:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
There's a bug report where this error happens on some Windows Django systems. However, my situation seems unrelated to that case.
What could be the problem?
EDIT: The code is like this:
def Strip(word):
for s in suffixes:
return word.rstrip(s)
Upvotes: 3
Views: 2410
Reputation: 35089
The issue is that s
is a bytestring, while word
is a unicode string - so, Python tries to turn s
into a unicode string so that the rstrip
makes sense. The issue is, it assumes s
is encoded in ASCII, which it clearly isn't (since it contains a character outside the ASCII range).
So, since you intitialise it as a literal, it is very easy to turn it into a unicode string by putting a u
in front of it:
suffixes = [u'ি']
Will work. As you add more suffixes, you'll need the u
in front of all of them individually.
Upvotes: 4
Reputation: 17797
I guess this happens because of implicit conversion in python2. It's explained in this document, but I recommend you to read the whole presentation about handling unicode in python 2 and 3 (and why python3 is better ;-))
So, I think the solution to your problem would be to force the decoding of strings as utf8 before striping.
Something like :
def Strip(word):
word = word.decode("utf8")
for s in suffixes:
return word.rstrip(s.decode("utf8")
Second try :
def Strip(word):
if type(word) == str:
word = word.decode("utf8")
for s in suffixes:
if type(s) == str:
s = s.decode("utf8")
return word.rstrip(s)
Upvotes: 3