TheRookierLearner
TheRookierLearner

Reputation: 4163

URL UTF-8 Decoding Python

I am having some data in URL format and I want to decode it using Python. I tried the (accepted) answer here but I am still not getting getting the correct decoding. My code is as follows:

import urllib2

name = '%D0%BD%D0%BE%D1%82%D0%B8%D1%84%D0%B8%D0%BA%D0%B0%D1%82%D0%BE%D1%80-%D0%BE%D0%BB%D0%B8%D0%BC%D0%BF%D0%B8%D0%B9%D1%81%D0%BA%D0%B8%D1%85-%D0%B8'

print urllib2.unquote(urllib2.quote(name.encode("utf8"))).decode("utf8")

This should print нотификатор-олимпийских-и but it prints %D0%BD%D0%BE%D1%82%D0%B8%D1%84%D0%B8%D0%BA%D0%B0%D1%82%D0%BE%D1%80-%D0%BE%D0%BB%D0%B8%D0%BC%D0%BF%D0%B8%D0%B9%D1%81%D0%BA%D0%B8%D1%85-%D0%B8

so I tried unquoting it again

print urllib2.unquote(urllib2.unquote(urllib2.quote(name.encode("utf8"))).decode("utf8"))

but it gives me ноÑиÑикаÑоÑ-олимпийÑкиÑ-и

I am not sure why this happens. Can anyone please explain where am I doing wrong and how do I correct my mistake?

Upvotes: 0

Views: 158

Answers (1)

Konrad Rudolph
Konrad Rudolph

Reputation: 545488

Too many quote/unquote operations: you get a UTF-8 string that is already URL-encoded, why are you UTF-8 and URL encoding it again?

unquoted = urllib.unquote(name)
print unquoted.decode('utf-8')
# нотификатор-олимпийских-и

Upvotes: 1

Related Questions