waynedpj
waynedpj

Reputation: 313

allow non-query question marks "?" in URL/URI w/o encoding

setup

we have the following directory structure on our HTTP/web server:

/questions/who?/
/questions/what?/
/happy? part. 01/
/happy? yet?/
/happy? yet? again? really?!/

question

my question: is it possible to have the corresponding URIs/URLs with unescaped/unencoded question marks (?) resolve correctly? e.g. the URL http://test.org/happy? part. 01/ will resolve to /happy? part. 01/ on the server. due to ? signifying a query string, this is has been a pickle of a problem for me.

background/research

as expected by default Apache treats the first ? as the beginning of a query string. so out of the box a URL of http://test.org/happy? part. 01/ will be converted to a URI path /happy and query string part.01/, resulting in a 404 since the path /happy does not exist.

most of the other answers/tips i have found in my research mainly deal with rewriting the URL assuming that the ? indicates a query string, e.g.

however, in this case we can assume that our HTTP server will not be receiving URLs with query strings.

i realize that normally browsers/etc. will encode the URI before sending it to the server (e.g. http://test.org/happy? part. 01/ will be sent to the server as http://test.org/happy%3F%20part.%2001/, though which characters are encoded depends on the app and their support for which URI standard version: RFC2396 or RFC3986). but for this scenario the server may be getting unencoded URLs but never any URLs with query strings.

my attempts

at first i thought a simple rule like this would suffice:

RewriteRule ([^\?]*?)\?([^\?]*?) $1\?$2 [NE,N]

here i am trying to repeatedly find all the ?s and simply reinsert them into the URL unescaped. unfortunately the regular expression (and many variations) is not matching the URLs that contain ?, instead only matching the encoded ? value %3F. and even when it matches, the second capture group $2 seems to always be empty. finally, the \? in the substitution string seems to be preventing anything after it from being written.

the above linked solutions led me to the fact that to check for the ? i had to check %{THE_REQUEST} variable since Apache will strip the query string for other server variables/RewriteRules. to that end i tried variations of this:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ \/([^\?]*?)\?([^\?]*?)\/?\ HTTP
RewriteRule ^(.*?)\?(.*?)$ $1\?$2 [NE,N]

while the regular expression of the RewriteCond is matching URIs with ?, the %2 in the RewriteRule causes an Internal Server Error, though without it i seem to have no way of accessing the part of the URL after the ?.

finally, i also tried various things with %{QUERY_STRING} and [QSA] but still no luck.

thanks for taking a look.

Upvotes: 4

Views: 2196

Answers (1)

Jon Lin
Jon Lin

Reputation: 143946

How about simply:

RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{QUERY_STRING} !^$
RewriteRule ^(.*)$ /$1\%3F%{QUERY_STRING} [L,NE]

EDIT:

Try this:

RewriteCond %{QUERY_STRING} ^(.*)\?(.*)$
RewriteRule ^(.*)$ /$1?%1\%3F%2 [L]

RewriteCond %{QUERY_STRING} !^$
RewriteRule ^(.*)$ /$1\%3F%{QUERY_STRING}? [L,NE]

Upvotes: 2

Related Questions