Reputation: 1304
I'm trying to encode non-ASCII characters so I can put them inside an url and use them in urlopen
. The problem is that I want an encoding like JavaScript (that for example encodes ó
as %C3%B3
):
encodeURIComponent(ó)
'%C3%B3'
But urllib.quote
in python returns ó
as %F3
:
urllib.quote(ó)
'%F3'
I want to know how to achieve an encoding like javascript's encodeURIComponent
in Python, and also if I can encode non ISO 8859-1
characters like Chinese. Thanks!
Upvotes: 35
Views: 50521
Reputation: 1364
Note that encodeURIComponent() does not encode the chars A-Z a-z 0-9 - _ . ! ~ * ' ( )
. By default urllib.parse.quote()
does encode some of these chars, you need to pass the safe
chars list to get an equivalent encoder for Python.
In Python 3 the correct solution is
from urllib.parse import quote
quote("ó", safe="!~*'()")
Upvotes: 3
Reputation: 25181
in Python 3 the urllib.quote
has been renamed to urllib.parse.quote
.
Also in Python 3 all strings are unicode strings (the byte strings are called bytes
).
Example:
from urllib.parse import quote
print(quote('ó'))
# output: %C3%B3
Upvotes: 50
Reputation: 6699
You want to make sure you're using unicode.
Example:
import urllib
s = u"ó"
print urllib.quote(s.encode("utf-8"))
Outputs:
%C3%B3
Upvotes: 39