Reputation: 1590
I send cyrillic letters from postman to django as a parameter in url and got something like %D0%B7%D0%B2
in variable search_text
actually if to print search_text
I got something like текст
printed
I've tried in console to make the following and didn't get an error
>>> a = "текст"
>>> a
'\xd1\x82\xd0\xb5\xd0\xba\xd1\x81\xd1\x82'
>>> print a
текст
>>> b = a.decode("utf-8")
>>> b
u'\u0442\u0435\u043a\u0441\u0442'
>>> print b
текст
>>>
by without console I do have an error:
"""WHERE title LIKE '%%{}%%' limit '{}';""".format(search_text, limit))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
How to prevent it?
Upvotes: 5
Views: 5878
Reputation: 6341
To decode urlencoded string (with '%' signs) use the urllib:
import urllib
byte_string=urllib.unquote('%D0%B7%D0%B2')
and then you'll need to decode
the byte_string
from it's original encoding, i.e.:
import urllib
import codecs
byte_string=urllib.unquote('%D0%B7%D0%B2')
unicode_string=codecs.decode(byte_string, 'utf-8')
and print(unicode_string)
will print зв
.
The problem is with the unknown encoding. You have to know what encoding is used for the data you get. To specify the default encoding used in your script .py file, place the following line at the top:
# -*- coding: utf-8 -*-
Cyrillic might be 'cp866', 'cp1251', 'koi8_r' and 'utf-8', this are the most common. So when using decode
try those.
Python 2 doesn't use unicode by default, so it's best to enable it or swich to Python 3. To enable unicode in .py file put the following line on top of all imports:
from __future__ import unicode_literals
So i.e. in Python 2.7.9, the following works fine:
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
a="текст"
c="""WHERE title LIKE '%%{}%%' limit '{}';""".format(a, '10')
print(c)
Also see:
https://docs.python.org/2/library/codecs.html
https://docs.python.org/2/howto/unicode.html.
Upvotes: 3
Reputation: 1398
it depends on what encoding the django program is expecting and the strings search_text, limit
are. usually its sufficient to do this:
"""WHERE title LIKE '%%{}%%' limit '{}';""".decode("utf-8").format(search_text.decode("utf-8"), limit)
EDIT** after reading your edits, it seems you are having problems changing back your urlparsed texts into strings. heres an example of how to do this:
import urlparse
print urlparse.urlunparse(urlparse.urlparse("ресторан"))
Upvotes: 2
Reputation: 10096
You can use '{}'.format(search_text.encode('utf-8')))
to interpret the string as utf-8
, but it probably will show your cyrillic letters as \xd0
.
And read The Absolute Minimum Every Software Developer Must Know About Unicode and Character Sets.
Upvotes: 1