Reputation: 5545
The use-case example is saving the contents of http://example.com
as a filename on your computer, but with the unsafe characters (i.e. :
and /
) escaped.
The classic way is to use a regex to strip all non-alphanumeric-dash-underscore characters out, but then that makes it impossible to reverse the filename into a URL. Is there a way, possibly a combination of CGI.escape
and another filter, to sanitize the filename for both Windows and *nix? Even if the tradeoff is a much longer filename?
edit:
Example with CGI.escape
CGI.escape 'http://www.example.com/Hey/whatsup/1 2 3.html#hash'
#=> "http%3A%2F%2Fwww.example.com%2FHey%2Fwhatsup%2F1+2+3.html%23hash"
A couple things...are %
signs completely safe as file characters? Unfortunately, CGI.escape
doesn't convert spaces in a malformed URL to %20
on the first pass, so I suppose any translation method would require changing all spaces to +
with a gsub
and then applying CGI.escape
Upvotes: 1
Views: 361
Reputation: 29536
here is how I would do it (adjust the regular expression as needed):
url = "http://stackoverflow.com/questions/18234870/how-to-reversibly-escape-a-url-in-ruby-so-that-it-can-be-saved-to-the-file-syste"
filename = url.each_char.map {|x|
x.match(/[a-zA-Z0-9-]/) ? x : "_#{x.unpack('H*')[0]}"
}.join
EDIT:
if the length of the resulting file name is a concern then I would store the files in sub-directories with the same names as the url path segments.
Upvotes: 2
Reputation: 4796
One of the ways is by "hashing" the filename. For example, the URL for this question is: https://stackoverflow.com/questions/18234870/how-to-reversibly-escape-a-url-in-ruby-so-that-it-can-be-saved-to-the-file-syste
. You could use the Ruby standard library's digest/md5
library to hash the name. Simple and elegant.
require "digest/md5"
foldername = "https://stackoverflow.com/questions/18234870/how-to-reversibly-escape-a-url-in-ruby-so-that-it-can-be-saved-to-the-file-syste"
hashed_name = Digest::MD5.hexdigest(foldername) # => "5045cccd83a8d4d5c4fc01f7b4d8c502"
The corollary for this scheme would be that MD5 hashing is used to validate the authenticity/completeness of downloads since for all practical purposes, the MD5 digest of the string always returns the same hex-string.
However, I won't call this "reversible". You need to have a custom way to look up the URLs for each of the hashes that get generated. May be, a .yml
file with that data.
update: As @the Tin Man suggests, a simple SQLite db would be much better than a .yml
file when there are a large number of files that need storing.
Upvotes: 3