Zando
Zando

Reputation: 5545

How to reversibly escape a URL in Ruby so that it can be saved to the file system

The use-case example is saving the contents of http://example.com as a filename on your computer, but with the unsafe characters (i.e. : and /) escaped.

The classic way is to use a regex to strip all non-alphanumeric-dash-underscore characters out, but then that makes it impossible to reverse the filename into a URL. Is there a way, possibly a combination of CGI.escape and another filter, to sanitize the filename for both Windows and *nix? Even if the tradeoff is a much longer filename?

edit:

Example with CGI.escape

 CGI.escape 'http://www.example.com/Hey/whatsup/1 2 3.html#hash'
 #=> "http%3A%2F%2Fwww.example.com%2FHey%2Fwhatsup%2F1+2+3.html%23hash"

A couple things...are % signs completely safe as file characters? Unfortunately, CGI.escape doesn't convert spaces in a malformed URL to %20 on the first pass, so I suppose any translation method would require changing all spaces to + with a gsub and then applying CGI.escape

Upvotes: 1

Views: 361

Answers (2)

akonsu
akonsu

Reputation: 29536

here is how I would do it (adjust the regular expression as needed):

url = "http://stackoverflow.com/questions/18234870/how-to-reversibly-escape-a-url-in-ruby-so-that-it-can-be-saved-to-the-file-syste"
filename = url.each_char.map {|x|
  x.match(/[a-zA-Z0-9-]/) ? x : "_#{x.unpack('H*')[0]}"
}.join

EDIT:

if the length of the resulting file name is a concern then I would store the files in sub-directories with the same names as the url path segments.

Upvotes: 2

Kashyap
Kashyap

Reputation: 4796

One of the ways is by "hashing" the filename. For example, the URL for this question is: https://stackoverflow.com/questions/18234870/how-to-reversibly-escape-a-url-in-ruby-so-that-it-can-be-saved-to-the-file-syste. You could use the Ruby standard library's digest/md5 library to hash the name. Simple and elegant.

require "digest/md5"

foldername = "https://stackoverflow.com/questions/18234870/how-to-reversibly-escape-a-url-in-ruby-so-that-it-can-be-saved-to-the-file-syste"
hashed_name = Digest::MD5.hexdigest(foldername) # => "5045cccd83a8d4d5c4fc01f7b4d8c502"

The corollary for this scheme would be that MD5 hashing is used to validate the authenticity/completeness of downloads since for all practical purposes, the MD5 digest of the string always returns the same hex-string.

However, I won't call this "reversible". You need to have a custom way to look up the URLs for each of the hashes that get generated. May be, a .yml file with that data.


update: As @the Tin Man suggests, a simple SQLite db would be much better than a .yml file when there are a large number of files that need storing.

Upvotes: 3

Related Questions