Travis Griggs
Travis Griggs

Reputation: 22252

Reversible encoding of string to unix filename

Is there a way to turn arbitrary user input names into safe filenames with an encoding that is reversible?

I have some data files that belong to entities that users named. Of course, they can do silly things like put invalid filesystem characters in their names.

The two suggestions I see frequently for this are:

A) Base64 encode them

B) Strip illegal characters

Base64 is reversible, but for debugging/introspection, it's really nice when the file names look as much like the names as possible. Just keeps everything more debuggable. Approach B isn't reversible, so the "actual" name has to be stored redundantly anyway, so there's no real value in not just using a uuid or something.

This if specifically for Linux. While this isn't python specific, that's what I'm implementing it in.

Upvotes: 2

Views: 1614

Answers (2)

kdt
kdt

Reputation: 148

You could URL-encode the string provided by the user.

According the Wikipedia article on Percent Encoding (which itself quotes RFC 3986), the only URL-safe characters are A-Z, a-z, 0-9, dash, underscore, dot, and tilde (~). Tilde has a unique interpretation in the shell, but it's not illegal for Linux filenames.

It looks like URL-encoding is pretty easy in Python with urllib(2), but I'm not a Python programmer.

See: URL encoding/decoding with Python

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121594

You could use URL encoding:

from urllib.parse import quote

safefilename = quote(filename, safe='')

This is fully round-trippable, and keeps ASCII characters readable:

>>> from urllib.parse import quote, unquote
>>> quote('foo/../bar', safe='')
'foo%2F..%2Fbar'
>>> unquote(quote('foo/../bar', safe=''))
'foo/../bar'

Do set safe to the empty string; the default is '/' so slashes are not normally escaped.

Upvotes: 4

Related Questions