Reputation: 50990
I have an application in which URLs must be constructed using the plus sign in the URL (because these are names of actual companies). I'm having some trouble writing links into my HTML that can be correctly received and processed by CherryPy. I believe the problem is that, in the case of the +
sign only, both CherryPy and my code are attempting to decode the %2B in the incoming HTML so that it's first converted (correctly) to +
and then further converted (incorrectly) to a space.
For example, consider URLs of the form /:category/:company where the category is Food and Beverage
and two possible company names are Eat / Drink / Be Merry
and Jane+Janet
.
I render these into my HTML with
'/{}/{}'.format(
urllib.quote_plus(self.category.encode('utf8')),
urllib.quote_plus(self.company_name.encode('utf8'))
)
Then, in CherryPy, I receive the category and company_name using routes like /:category/:company_name
and perform the following processing on company_name:
def Company(category, company_name):
print company_name
company_name = company_name.encode('utf-8')
print company_name
company_name = urllib.unquote_plus(company_name)
print company_name
company_name = company_name.decode('utf-8')
print company_name
This works correctly for company names without characters subject to URL encoding, and it works for company names with most URL-encoding-required characters (for instance, no problem with Eat / Drink / Be Merry
). But, if my original company name had a +
sign in it, it does not work. It appears that CherryPy has already done part of the decoding for me (replacing %2B
with +
) so that when I apply my own decoding, the +
is replaced with a space.
Here are the results of the four print
statements for Eat / Drink / Be Merry
:
Eat%20%2F%20Drink%20%2F%20Be%20Merry
Eat%20%2F%20Drink%20%2F%20Be%20Merry
Eat / Drink / Be Merry
Eat / Drink / Be Merry
and for Jane+Janet
:
Jane+Janet
Jane+Janet
Jane Janet
Jane Janet
My application fails at this point because there is no "Jane Janet" entry in the database to update.
How can I avoid this double-decoding of the +
sign?
Upvotes: 1
Views: 269
Reputation: 5741
The decoding of the url (percent encoding) is an integral part of the http server, you shouldn't have to do you own urllib.unquote_plus
.
If you really want to get the raw URI, cherrypy has a non-standard REQUEST_URI
key in the wsgi environment, you can get that with: cherrypy.request.wsgi_environ['REQUEST_URI']
.
But really, you should just use the params that cherrypy send you directly, the encoding/decoding is part of the transmission of the data with HTTP, it shouldn't concern your application logic.
Upvotes: 1