Ian
Ian

Reputation: 25356

Unescape Python Strings From HTTP

I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it?

myemail%40gmail.com -> [email protected]

Would urllib.unquote() be the way to go?

Upvotes: 20

Views: 21205

Answers (4)

Matan Dobrushin
Matan Dobrushin

Reputation: 195

Small correction to the previous answers (tested with python 3.11) -

from urllib.parse import unquote
unquote('myemail%40gmail.com')
'[email protected]'

Upvotes: 0

In Python 3, these functions are urllib.parse.unquote and urllib.parse.unquote_plus.

The latter is used for example for query strings in the HTTP URLs, where the space characters () are traditionally encoded as plus character (+), and the + is percent-encoded to %2B.

In addition to these there is the unquote_to_bytes that converts the given encoded string to bytes, which can be used when the encoding is not known or the encoded data is binary data. However there is no unquote_plus_to_bytes, if you need it, you can do:

def unquote_plus_to_bytes(s):
    if isinstance(s, bytes):
        s = s.replace(b'+', b' ')
    else:
        s = s.replace('+', ' ')
    return unquote_to_bytes(s)

More information on whether to use unquote or unquote_plus is available at URL encoding the space character: + or %20.

Upvotes: 4

las3rjock
las3rjock

Reputation: 8724

Yes, it appears that urllib.unquote() accomplishes that task. (I tested it against your example on codepad.)

Upvotes: 2

Paolo Bergantino
Paolo Bergantino

Reputation: 488664

I am pretty sure that urllib's unquote is the common way of doing this.

>>> import urllib
>>> urllib.unquote("myemail%40gmail.com")
'[email protected]'

There's also unquote_plus:

Like unquote(), but also replaces plus signs by spaces, as required for unquoting HTML form values.

Upvotes: 38

Related Questions