Velvet Ghost
Velvet Ghost

Reputation: 428

UnicodeDecodeError when using a Python string handling function

I'm doing this:

word.rstrip(s)

Where word and s are strings containing unicode characters.

I'm getting this:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

There's a bug report where this error happens on some Windows Django systems. However, my situation seems unrelated to that case.

What could be the problem?


EDIT: The code is like this:

def Strip(word):
    for s in suffixes:
        return word.rstrip(s)

Upvotes: 3

Views: 2410

Answers (2)

lvc
lvc

Reputation: 35089

The issue is that s is a bytestring, while word is a unicode string - so, Python tries to turn s into a unicode string so that the rstrip makes sense. The issue is, it assumes s is encoded in ASCII, which it clearly isn't (since it contains a character outside the ASCII range).

So, since you intitialise it as a literal, it is very easy to turn it into a unicode string by putting a u in front of it:

suffixes = [u'ি']

Will work. As you add more suffixes, you'll need the u in front of all of them individually.

Upvotes: 4

Scharron
Scharron

Reputation: 17797

I guess this happens because of implicit conversion in python2. It's explained in this document, but I recommend you to read the whole presentation about handling unicode in python 2 and 3 (and why python3 is better ;-))

So, I think the solution to your problem would be to force the decoding of strings as utf8 before striping.

Something like :

def Strip(word):
    word = word.decode("utf8")
    for s in suffixes:
        return word.rstrip(s.decode("utf8")

Second try :

def Strip(word):
    if type(word) == str:
        word = word.decode("utf8")
    for s in suffixes:
        if type(s) == str:
            s = s.decode("utf8")
        return word.rstrip(s)

Upvotes: 3

Related Questions