How to encode blob names that end with a period?

Question

Avoid blob names that end with a dot (.), a forward slash (/), or a sequence or combination of the two.

I cannot avoid such names due to legacy s3 compatibility and so I must encode them.

How should I encode such names?

I don't want to use base64 since that will make it very hard to debug when looking in azure's blob console.

Go has https://golang.org/pkg/net/url/#QueryEscape but it has this limitation:

From Go's implementation of url.QueryEscape (specifically, the shouldEscape private function), escapes all characters except the following: alphabetic, decimal digits, '-', '_', '.', '~'.

Imre P&#252;hvel · Accepted Answer

I don't think there's any universal solution to handle this outside your application scope. Within your application scope, you can do ANY encoding so it falls to personal preference how you like your data to be laid out. There is not "right" way to do this.

Regardless, I believe you should go for these properties:

Conversion MUST be bidirectional and without conflicts in your expected file name space
DO keep file names without ending dots unencoded
with dot-ending files, DO encode just the conflicting dots, keeping the original name readable.

This would keep most (the non-conflicting) files short and with the original intuitive or hopefully meaningful names and should you ever be able to rename or phase out the conflicting files just remove the conversion logic without restructuring all stored data and their urls.

I'll suggest 2 examples for this. Lets suggest you have files:

/someParent/normal.txt
/someParent/extensionless
/someParent/single.
/someParent/double..

Use special subcontainers

You could remove N dots from end of filename and translate them to subcontainer name "dot", "dotdot" etc.

The result urls would like:

/someParent/normal.txt
/someParent/extensionless
/someParent/dot/single
/someParent/dotdot/double

When reading you can remove the "dot"*N folder level and append N dots back to file name. Obviously this assumes you don't ever need to have such "dot" folders as data themselves.

This is preferred if stored files can come in with any extension but you can make some assumptions on folder structure.

Use discardable artificial extension

Since the conflict is at the end you could just append a never-used dummy extension to given files. For example "endswithdots", but you could choose something more suitable depending on what the expected extensions are:

/someParent/normal.txt
/someParent/extensionless
/someParent/single.endswithdots
/someParent/double..endswithdots

On reading if the file extension is "endswithdots" you remove the "endswithdots" part from end of filename.

This is preferred if your data could have any container structure but you can make some assumptions on incoming extensions.

I would suggest against Base64 or other full-name encoding as it would make file names notably longer and lose any meaningful details the file names may contain.

How to encode blob names that end with a period?

Answers (1)

Use special subcontainers

Use discardable artificial extension

Related Questions