Reputation: 313
we have the following directory structure on our HTTP/web server:
/questions/who?/
/questions/what?/
/happy? part. 01/
/happy? yet?/
/happy? yet? again? really?!/
my question: is it possible to have the corresponding URIs/URLs with unescaped/unencoded question marks (?
) resolve correctly? e.g. the URL http://test.org/happy? part. 01/
will resolve to /happy? part. 01/
on the server. due to ?
signifying a query string, this is has been a pickle of a problem for me.
as expected by default Apache treats the first ?
as the beginning of a query string. so out of the box a URL of http://test.org/happy? part. 01/
will be converted to a URI path /happy
and query string part.01/
, resulting in a 404 since the path /happy
does not exist.
most of the other answers/tips i have found in my research mainly deal with rewriting the URL assuming that the ?
indicates a query string, e.g.
however, in this case we can assume that our HTTP server will not be receiving URLs with query strings.
i realize that normally browsers/etc. will encode the URI before sending it to the server (e.g. http://test.org/happy? part. 01/
will be sent to the server as http://test.org/happy%3F%20part.%2001/
, though which characters are encoded depends on the app and their support for which URI standard version: RFC2396 or RFC3986). but for this scenario the server may be getting unencoded URLs but never any URLs with query strings.
at first i thought a simple rule like this would suffice:
RewriteRule ([^\?]*?)\?([^\?]*?) $1\?$2 [NE,N]
here i am trying to repeatedly find all the ?
s and simply reinsert them into the URL unescaped. unfortunately the regular expression (and many variations) is not matching the URLs that contain ?
, instead only matching the encoded ?
value %3F
. and even when it matches, the second capture group $2
seems to always be empty. finally, the \?
in the substitution string seems to be preventing anything after it from being written.
the above linked solutions led me to the fact that to check for the ?
i had to check %{THE_REQUEST}
variable since Apache will strip the query string for other server variables/RewriteRules. to that end i tried variations of this:
RewriteCond %{THE_REQUEST} ^[A-Z]+\ \/([^\?]*?)\?([^\?]*?)\/?\ HTTP
RewriteRule ^(.*?)\?(.*?)$ $1\?$2 [NE,N]
while the regular expression of the RewriteCond
is matching URIs with ?
, the %2
in the RewriteRule
causes an Internal Server Error
, though without it i seem to have no way of accessing the part of the URL after the ?
.
finally, i also tried various things with %{QUERY_STRING}
and [QSA]
but still no luck.
thanks for taking a look.
Upvotes: 4
Views: 2196
Reputation: 143946
How about simply:
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{QUERY_STRING} !^$
RewriteRule ^(.*)$ /$1\%3F%{QUERY_STRING} [L,NE]
EDIT:
Try this:
RewriteCond %{QUERY_STRING} ^(.*)\?(.*)$
RewriteRule ^(.*)$ /$1?%1\%3F%2 [L]
RewriteCond %{QUERY_STRING} !^$
RewriteRule ^(.*)$ /$1\%3F%{QUERY_STRING}? [L,NE]
Upvotes: 2