satoru
satoru

Reputation: 33235

Algorithm that can encode a string (with known maximum length) to a fix-length string?

I'm downloading a lot of files whose URLs are listed in a text file.

When saving a file to disk, I use the MD5 checksum of its URL as the new filename. This is to avoid file name conflicts and invalid characters in the original file name.

But I also need a way to find the original URL from a downloaded file name, if I use MD5, I'll have to use a mapping that's very huge.

Is there any algorithm I can use instead that allow me to just decode the original URL from the file name?

Note that I also don't want the length of file names to vary to much.

Upvotes: 1

Views: 79

Answers (2)

Sorin
Sorin

Reputation: 11968

If you want a generic solution look for short string compression algorithms. Here's a previously answered question about it An efficient compression algorithm for short text strings. There's no way to grantee that you get equal length strings because some of them will compress better than others.

Since you are dealing with only html you can use that to store some data. For example you can simply put the original URL in front of the leading HTML tag or after the closing HTML tag. Or add a special tag or attribute to the file to store this information. Then you can keep MD5 as the file name, but if you need the url you would open the file and look for it there. This should allow you to store the data without affecting any use of the file and without having to store a large mapping table.

Upvotes: 0

xdevs23
xdevs23

Reputation: 4014

You can use base62, which is file system friendly and can be en-/decrypted. But you can't avoid file name collisions. If you want to avoid them too, you could append a MD5 of the file to the encrypted filename, and remove the MD5 when decrypting.

Upvotes: 0

Related Questions