Reputation: 127
Hello I want to save string into variable like this:
msg=_(u'Uživatel <a href="{0}">{1} {3}</a>').format(request.user.get_absolute_url, request.user.first_name, request.user.last_name)
But since the inserted variables contain characters with accents such as š I get UnicodeDecodeError even though I have set the encoding by# -*- coding: utf-8 -*-
It is weird (IMHO) that it was working when I was creating this string by concatenating the variables like this:
msg=u'Uživatel <a href="' + request.user.get_absolute_url + ...
I have no clue why it shouldn't be working since its running project and I had to use such statements many times.
If you have any advice how to solve this I will be very grateful.
Upvotes: 0
Views: 1088
Reputation: 127
The solution is pretty simple, I used get_absolute_url
instead of get_absolute_url()
. Sorry to bother you.
Upvotes: 0
Reputation: 37319
One of your user
lookups is returning an encoded bytestring rather than a Unicode object.
When Python 2.x is asked to concatenate Unicode and encoded bytestrings, it does so by decoding the bytestring into Unicode using the default encoding, which is ascii
unless you go to some effort to change it. The # -*- coding: utf-8 -*-
directive sets the encoding for your source code, but not the system default encoding.
From testing format
, it looks like it tries to convert the argument to match the type of the left-hand side.
Under 2.x, things will work fine as long as the bytestring you're using can be decoded using ascii
:
>>> u'test\u270c {0}'.format('bar')
u'test\u270c bar'
Or of course you're formatting in another Unicode object:
>>> u'test\u270c {0}'.format(u'bar\u270d')
u'test\u270c bar\u270d'
If you omit the u
before your format, you'll get a UnicodeEncodeError
:
>>> 'foo {0}'.format(u'test\u270c')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u270c' in position 4: ordinal not in range(128)
Conversely, if you format an encoded string with non-ascii bytes into a Unicode object, you'll get a UnicodeDecodeError
:
>>> u'foo {0}'.format(test.encode('utf-8'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)
I'd start by checking the get_absolute_url
implementation. Valid URLs can never contain unescaped non-ascii characters, so they should always be decodable by ascii, but if you're using things built from standard Django models first_name
and last_name
should be Unicode objects so I'd bet on a buggy implementation of get_absolute_url
at first.
Upvotes: 2
Reputation: 1162
Check the type of the arguments to format, I guess they are 'str', not 'unicode'. Before using them, encode them apropriatly, e.g.:
url = request.user.get_absolute_url
if isinstance(url, str):
print 'url was str'
a = url.decode('utf-8')
msg = u'Uživatel <a href="{0}">...</a>').format(url)
(The if
and print
statement is just for demonstration purpose)
Use the other values accordingly.
Upvotes: 0