Larry Lustig
Larry Lustig

Reputation: 50990

Trouble with URL containing "+" in CherryPy

I have an application in which URLs must be constructed using the plus sign in the URL (because these are names of actual companies). I'm having some trouble writing links into my HTML that can be correctly received and processed by CherryPy. I believe the problem is that, in the case of the + sign only, both CherryPy and my code are attempting to decode the %2B in the incoming HTML so that it's first converted (correctly) to + and then further converted (incorrectly) to a space.

For example, consider URLs of the form /:category/:company where the category is Food and Beverage and two possible company names are Eat / Drink / Be Merry and Jane+Janet.

I render these into my HTML with

 '/{}/{}'.format(
      urllib.quote_plus(self.category.encode('utf8')),
      urllib.quote_plus(self.company_name.encode('utf8'))
  )

Then, in CherryPy, I receive the category and company_name using routes like /:category/:company_name and perform the following processing on company_name:

def Company(category, company_name):
    print company_name
    company_name = company_name.encode('utf-8')
    print company_name
    company_name = urllib.unquote_plus(company_name)
    print company_name
    company_name = company_name.decode('utf-8')
    print company_name

This works correctly for company names without characters subject to URL encoding, and it works for company names with most URL-encoding-required characters (for instance, no problem with Eat / Drink / Be Merry). But, if my original company name had a + sign in it, it does not work. It appears that CherryPy has already done part of the decoding for me (replacing %2B with +) so that when I apply my own decoding, the + is replaced with a space.

Here are the results of the four print statements for Eat / Drink / Be Merry:

Eat%20%2F%20Drink%20%2F%20Be%20Merry
Eat%20%2F%20Drink%20%2F%20Be%20Merry
Eat / Drink / Be Merry
Eat / Drink / Be Merry

and for Jane+Janet:

Jane+Janet
Jane+Janet
Jane Janet
Jane Janet

My application fails at this point because there is no "Jane Janet" entry in the database to update.

How can I avoid this double-decoding of the + sign?

Upvotes: 1

Views: 269

Answers (1)

cyraxjoe
cyraxjoe

Reputation: 5741

The decoding of the url (percent encoding) is an integral part of the http server, you shouldn't have to do you own urllib.unquote_plus.

If you really want to get the raw URI, cherrypy has a non-standard REQUEST_URI key in the wsgi environment, you can get that with: cherrypy.request.wsgi_environ['REQUEST_URI'].

But really, you should just use the params that cherrypy send you directly, the encoding/decoding is part of the transmission of the data with HTTP, it shouldn't concern your application logic.

Upvotes: 1

Related Questions