Thaylin
Thaylin

Reputation: 35

Issue using mod_rewrite/rewite map and urls with spaces in them

we are doing a move from a server product to a cloud product and unfortunately they are changing the URLs on us. To help mitigate the issue of links to specific items in the new location I have generated a mapping of all the URLs. While the ones without spaces in them work the ones with spaces do not and I end up with a 404 message

This is the code I have in my httpd.conf file

  RewriteCond ${redirects:%{HTTP_HOST}%{REQUEST_URI}} ^.+$
  RewriteMap redirects /confs/redirects.example.com/URLs
  RewriteRule .* https://${redirects:%{HTTP_HOST}%{REQUEST_URI}} [redirect=temporary,last,qsdiscard]

This is an example of the mapping

"redirects.example.com/confluence/display/ADS/New Tech Tip - Pidgin Settings" newsite.vendor.com/wiki/spaces/ADS/blog/2011/10/12/8487166/New+Tech+Tip+-+Pidgin+Settings

As you can see I have tried encapsulating it in " as well as escaping the space with \ and converting the spaces to %20, as I have seen in the other posts, however nothing seems to work.

I can see that the spaces are encoded to %20 in the error log when it gives me a 404 message. Any help would be greatly appreciated.

Upvotes: 1

Views: 71

Answers (1)

MrWhite
MrWhite

Reputation: 45829

The problem is that spaces are delimiters in TXT map-type files and I don't believe there is a way to escape literal spaces in the map file itself.

The literal spaces are encoded as %20 in the HTTP request (they must be in order to form a valid HTTP request), but the REQUEST_URI server variable is %-decoded. So you are trying to use a string with literal spaces to perform the lookup which is going to fail.

A solution would be to store the %-encoded URL in the TXT map file (ie. spaces are encoded as %20) and use the %-encoded URL-path to make the lookup. We can extract the %-encoded URL from THE_REQUEST server variable.

So, the map file should look like this:

redirects.example.com/confluence/display/ADS/New%20Tech%20Tip%20-%20Pidgin%20Settings newsite.vendor.com/wiki/spaces/ADS/blog/2011/10/12/8487166/New+Tech+Tip+-+Pidgin+Settings
RewriteCond ${redirects:%{HTTP_HOST}%{REQUEST_URI}} ^.+$
RewriteMap redirects /confs/redirects.example.com/URLs
RewriteRule .* https://${redirects:%{HTTP_HOST}%{REQUEST_URI}} [redirect=temporary,last,qsdiscard]

Note that the RewriteMap directive itself is not part of the RewriteRule / RewriteCond rule. You appear to have sandwiched it in the rule itself (it might "work" but it's not correct). Also, you are not declaring the map-type.

Try the following instead:

RewriteMap redirects "txt:/confs/redirects.example.com/URLs"

RewriteCond %{THE_REQUEST} \s(/[^?\s]*)
RewriteCond ${redirects:%{HTTP_HOST}%1} (.+)
RewriteRule ^ https://%1 [R=302,L,QSD]

The first condition (RewriteCond directive) captures the %-encoded URL-path from THE_REQUEST server variable. (THE_REQUEST contains the first line from the HTTP request header - that contains the request method, URL and protocol, for example: GET /url?query HTTP/1.1.)

The captured %-encoded URL-path is then used in the 2nd condition (as part of the call to the rewrite map) using the %1 backreference.

The result of the rewrite map lookup is also captured in the %1 backreference (overwriting the earlier backreference) in the 2nd condition. This is then used in the substitution string - no need to call the rewrite map a second time.

I used the shorthand flags, but that's just personal preference. Obviously this should ultimately be a 301 (permanent) redirect once you have confirmed it works as intended.

Upvotes: 1

Related Questions