Dalbenn
Dalbenn

Reputation: 127

Python format UnicodeDecodeError

Hello I want to save string into variable like this:

  msg=_(u'Uživatel <a href="{0}">{1} {3}</a>').format(request.user.get_absolute_url, request.user.first_name, request.user.last_name)

But since the inserted variables contain characters with accents such as š I get UnicodeDecodeError even though I have set the encoding by# -*- coding: utf-8 -*-

It is weird (IMHO) that it was working when I was creating this string by concatenating the variables like this:

msg=u'Uživatel <a href="' + request.user.get_absolute_url + ...

I have no clue why it shouldn't be working since its running project and I had to use such statements many times.

If you have any advice how to solve this I will be very grateful.

Upvotes: 0

Views: 1088

Answers (3)

Dalbenn
Dalbenn

Reputation: 127

The solution is pretty simple, I used get_absolute_urlinstead of get_absolute_url(). Sorry to bother you.

Upvotes: 0

Peter DeGlopper
Peter DeGlopper

Reputation: 37319

One of your user lookups is returning an encoded bytestring rather than a Unicode object.

When Python 2.x is asked to concatenate Unicode and encoded bytestrings, it does so by decoding the bytestring into Unicode using the default encoding, which is ascii unless you go to some effort to change it. The # -*- coding: utf-8 -*- directive sets the encoding for your source code, but not the system default encoding.

From testing format, it looks like it tries to convert the argument to match the type of the left-hand side.

Under 2.x, things will work fine as long as the bytestring you're using can be decoded using ascii:

>>> u'test\u270c {0}'.format('bar')
u'test\u270c bar'

Or of course you're formatting in another Unicode object:

>>> u'test\u270c {0}'.format(u'bar\u270d')
u'test\u270c bar\u270d'

If you omit the u before your format, you'll get a UnicodeEncodeError:

>>> 'foo {0}'.format(u'test\u270c')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u270c' in position 4: ordinal not in range(128)

Conversely, if you format an encoded string with non-ascii bytes into a Unicode object, you'll get a UnicodeDecodeError:

>>> u'foo {0}'.format(test.encode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

I'd start by checking the get_absolute_url implementation. Valid URLs can never contain unescaped non-ascii characters, so they should always be decodable by ascii, but if you're using things built from standard Django models first_name and last_name should be Unicode objects so I'd bet on a buggy implementation of get_absolute_url at first.

Upvotes: 2

Chris
Chris

Reputation: 1162

Check the type of the arguments to format, I guess they are 'str', not 'unicode'. Before using them, encode them apropriatly, e.g.:

url = request.user.get_absolute_url
if isinstance(url, str):
    print 'url was str'
    a = url.decode('utf-8')
msg = u'Uživatel <a href="{0}">...</a>').format(url)

(The if and print statement is just for demonstration purpose) Use the other values accordingly.

Upvotes: 0

Related Questions