Reputation: 355
Now I have a url which is in quote format:
http%3A%2F%2Fimages.1datatech.cn%2F%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8.png
I want to unquote it into a real url, and my condition is: 1, "a real url" should be able to access directly, such as wget xxxxxx 2, "a real url" should only be in ascii. Actually I need to save this into a database entry, and the field is set to be ascii 2000 character, so it is "indexable". A string too long, or a unicode string of 2000 character, will not be indexable.
So I set b = urllib.parse.unquote(a)
, I got:
http://images.1datatech.cn/大数据应用.png
That violate my second rule. So I try to quote it again: c = urllib.parse.quote(b)
, I got:
http%3A//images.1datatech.cn/%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8.png
Question, how can I get
http://images.1datatech.cn/%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8.png
?
I don't want to use c.replace('%3A', ':')
because there might be other characters such as ? & =
might need special treatment, if I go down this route.
This seems to be a simple task (transfer a quote url to a real url), but I am stuck. Please help
Upvotes: 1
Views: 2030
Reputation: 531
Try using urllib.parse.quote
with safe
parameter (for python3):
import urllib.parse
url = 'http%3A%2F%2Fimages.1datatech.cn%2F%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8.png'
url = urllib.parse.quote(urllib.parse.unquote(url, encoding='utf-8'), safe=':/')
print(url)
Upvotes: 4