Virgiliu
Virgiliu

Reputation: 3128

Decoding unicode from Javascript in Python & Django

On a website I have the word pluș sent via POST to a Django view. It is sent as plu%25C8%2599. So I took that string and tried to figure out a way how to make %25C8%2599 back into ș.

I tried decoding the string like this:

from urllib import unquote_plus
s = "plu%25C8%2599"
print unquote_plus(unquote_plus(s).decode('utf-8'))

The result i get is pluÈ which actually has a length of 5, not 4.

How can I get the original string pluș after it's encoded ?

edit:

I managed to do it like this

def js_unquote(quoted):
  quoted = quoted.encode('utf-8')
  quoted = unquote_plus(unquote_plus(quoted)).decode('utf-8')
  return quoted

It looks weird but works the way I needed it.

Upvotes: 1

Views: 1861

Answers (3)

Sencer H.
Sencer H.

Reputation: 1271

unquote_plus(s).encode('your_lang_encoding')

I was try like that. I was tried to sent a json POST request by HTML form to directly a django URI, which is included unicode characters like "şğüöçı+" and it works. I have used iso_8859-9 encoder in encode() function.

Upvotes: 0

mkelley33
mkelley33

Reputation: 5601

You can't unless you know what the encoding is. Unicode itself is not an encoding. You might try BeautifulSoup or UnicodeDammit, which might help you get the result you were hoping for.

http://www.crummy.com/software/BeautifulSoup/

I hope this helps!

Also take a look at:

http://www.joelonsoftware.com/articles/Unicode.html

Upvotes: 1

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799420

URL-decode twice, then decode as UTF-8.

Upvotes: 2

Related Questions