Reputation: 18372
I'm writing a web application that dynamically creates URL's based off of some input, to be consumed by a client at another time. For discussion sake these URL's can contain certain characters, like a forward slash (i.e. '/'), which should not be interpreted as part of the actual URL, but just as an argument. For example:
http://mycompany.com/PartOfUrl1/PartOfUrl2/ArgumentTo/Url/GoesHere
As you can see, the ArgumentTo/Url/GoesHere does indeed have forward slashes but these should be ignored or escaped.
This may be a bad example but the question in hand is more general and applies to other special characters.
Given some of the answers I realized that I failed to point out a few pieces that hopefully will help clarify.
I would like to keep this fairly language agnostic as it would be great if the client could just make a request. For example, if the client knew that it wanted to pass ArgumentTo/Url/GoesHere, it would be great if that could be encoded into a unique string in which the server could turn around and decode it to use.
Can we assume that similar functions like HttpUtility.HtmlEncode/HtmlDecode in the .NET Framework are available on other systems/platforms? The URL does not have to be pretty by any means so having real words in the path does not really matter.
It seems that base64 encoding/decoding is fairly readily available on any platform/language.
Upvotes: 8
Views: 3769
Reputation: 530
You could use Apache rewrites to rewrite http:// mycompany.com/PartOfUrl1/PartOfUrl2
to http:// mycompany.com/path/to/program.php
and then pass in ArgumentTo/Url/GoesHere
as a standard GET parameter. So what the server actually sends back is the response for http:// mycompany.com/path/to/program.php?arg=ArgumentTo/Url/GoesHere
Rewriting is a good way to guard against technology changes (so switching from PHP to ASP, for example, won't change your URLs) and provide friendly URLs to your users at the same time.
Using your example URLs and building on what I said before, I'd say to use this code in your httpd.conf or .htaccess:
RewriteEngine On
RewriteRule http:// mycompany.com/PartOfUrl1/PartOfUrl2/([A-Za-z0-9]) http://mycompany.com/path/to/program.php?arg=$1
(BTW, remove the space after the first http://
in the RewriteRule
, plus that line needs to contain no line breaks.)
Changing the paths, the filenames, name of the arg, etc. is fine; the critical parts here are the regex (([A-Za-z0-9])
) and the $1
.
Upvotes: 3
Reputation: 47007
You didn't say which language you're using, but PHP has the useful urlencode
function and C# has HttpUtility.URLEncode
and Server.UrlEncode
which should encode parts of your URL nicely.
In case you need another way this page has a list of encoded values. E.g.: / == %2f
.
From what you've updated I'd say use Voyagerfan's idea of URLRewriting to make something like:
http://www.example.com/([A-Za-z0-9/]+) http://www.example.com/?page=$1
And then use the applications GET parser to filter it out.
Upvotes: 5
Reputation: 35871
Yes, Base64 encoding your argument will work for you, however you'll need to make sure your entire URL is under the size limit of your target browser (2083 characters for IE 4 - 7, according to this page).
Upvotes: 1
Reputation: 2371
Use the HtmlEncode and Decode methods on the server object. I believe that will remove most characters that should not be and takes care of other things such as spaces, etc.
Here's the MSDN Article: http://msdn.microsoft.com/en-us/library/ms525347.aspx
Upvotes: 0
Reputation: 197
I believe what you're looking for, if using .net, is the HttpUtility.EncodeUrl() method, as it has many overrides. Look here: http://msdn.microsoft.com/en-us/library/system.web.httputility.urlencode.aspx
Upvotes: 0