Golo Roden
Golo Roden

Reputation: 150902

How to shorten a UUID?

In a project I have to deal with UUIDs in v4 format, such as:

931d4657-2e07-477f-be0c-5dd02906a516

Basically, everything is fine with them, they are just pretty to long to type them manually. Hence I am thinking about ways to shorten them – but without losing the ability to get back to the original UUID. So just taking the first n bytes into account is not an option ;-)

My first idea was to represent it as unicode characters, instead of hex codes, but this leads to non-printable (and non-typable) characters. So that's not an option as well.

Then I though about Base64 (Base58, …) encodings, but they do not really make things noticeable shorter (I don't have a specific target length, I just want it to be a relevant amount of characters less, and saving 2 characters is not what I mean ;-)).

Is there a clever trick to do this, while keeping the option to get back to the UUID? Does anyone have an idea?

Upvotes: 7

Views: 8800

Answers (2)

StephenS
StephenS

Reputation: 2154

UUIDs are 128-bit numbers; the hex form is just a representation for human use, and not a particularly dense one at 3.55 bits per character. Lose the dashes and it goes to 4 bits per character.

Use base64 instead base16, and you'll get 6 bits per character for 22 total. That is about as good as you can do and keep it human-readable. You can actually get close to 7 bits per byte, which would cut 1-2 more bytes off, but that is substantially more compleixity (i.e. risk) that can't be justified for that small a gain.

Mathematically, that's as far as you can go and still be able to round-trip. If you still need shorter, then you have to give that up--which probably has other implications for your overall design.

Upvotes: 2

tisba
tisba

Reputation: 151

UUIDs are 128 bit (16 Bytes) long. There are some bits that could be removed here, if you drop version and variant; but I don't think this is an option in your case (also you can only safely remove 6 bits, see here).

Using base64 encoding will shave off about 40%

# example uses Ruby
SecureRandom.base64(16) # => UBm-_zkz20ka6dOAA8dkMg
SecureRandom.uuid       # => 3754e815-87fe-4872-8d9b-ae529607c277

In your comment you wrote that this is an identifier for your users. So maybe you can work with a shorted version in your UIs, much like git handles short SHAs. It depends on the amount of entities that you want to handle, but you should be able to reduce the "handle" a lot and still have a very low likelihood of a collision. In case of a collision you can then ask your user to provide more of the identifier.

Upvotes: 2

Related Questions