Ben L
Ben L

Reputation: 355

python urllib url quote/unquote issue

Now I have a url which is in quote format: http%3A%2F%2Fimages.1datatech.cn%2F%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8.png

I want to unquote it into a real url, and my condition is: 1, "a real url" should be able to access directly, such as wget xxxxxx 2, "a real url" should only be in ascii. Actually I need to save this into a database entry, and the field is set to be ascii 2000 character, so it is "indexable". A string too long, or a unicode string of 2000 character, will not be indexable.

So I set b = urllib.parse.unquote(a), I got: http://images.1datatech.cn/大数据应用.png

That violate my second rule. So I try to quote it again: c = urllib.parse.quote(b), I got: http%3A//images.1datatech.cn/%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8.png

Question, how can I get http://images.1datatech.cn/%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8.png ?

I don't want to use c.replace('%3A', ':') because there might be other characters such as ? & = might need special treatment, if I go down this route.

This seems to be a simple task (transfer a quote url to a real url), but I am stuck. Please help

Upvotes: 1

Views: 2030

Answers (1)

Ji Bin
Ji Bin

Reputation: 531

Try using urllib.parse.quote with safe parameter (for python3):

import urllib.parse

url = 'http%3A%2F%2Fimages.1datatech.cn%2F%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BA%94%E7%94%A8.png'
url = urllib.parse.quote(urllib.parse.unquote(url, encoding='utf-8'), safe=':/')
print(url)

Upvotes: 4

Related Questions