cyberixae
cyberixae

Reputation: 973

Abbreviating a UUID

What would be a good way to abbreviate UUID for use in a button in a user interface when the id is all we know about the target?

GitHub seems to abbreviate commit ids by taking 7 characters from the beginning. For example b1310ce6bc3cc932ce5cdbe552712b5a3bdcb9e5 would appear in a button as b1310ce. While not perfect this shorter version is sufficient to look unique in the context where it is displayed. I'm looking for a similar solution that would work for UUIDs. I'm wondering is some part of the UUID is more random than another.

The most straight forward option would be splitting at dash and using the first part. The UUID 42e9992a-8324-471d-b7f3-109f6c7df99d would then be abbreviated as 42e9992a. All of the solutions I can come up with seem equally arbitrary. Perhaps there is some outside the box user interface design solution that I didn't think of.

Upvotes: 10

Views: 1627

Answers (4)

ptman
ptman

Reputation: 917

Showing only the first x chars isn't a good idea for UUIDv7 since it begins with a timestamp.

Structure

UUIDv7 looks like this when represented as a string:

0190163d-8694-739b-aea5-966c26f8ad91
└─timestamp─┘ │└─┤ │└───rand_b─────┘
             ver │var
              rand_a

The 128-bit value consists of several parts:

    timestamp (48 bits) is a Unix timestamp in milliseconds.
    ver (4 bits) is a UUID version (7).
    rand_a (12 bits) is randomly generated.
    var* (2 bits) is equal to 10.
    rand_b (62 bits) is randomly generated.

https://antonz.org/uuidv7/

Upvotes: 1

StephenS
StephenS

Reputation: 2154

Entropy of a UUID is highest in the first bits for UUID V1 and V2, and evenly distributed for V3, V4 and V5. So, the first N characters are no worse than any other N characters subset.

For N=8, i.e. the group before the first dash, the odds of there being a collision within a list you could reasonably display within a single GUI screen is vanishingly small.

Upvotes: 6

cyberixae
cyberixae

Reputation: 973

After thinking about this for a while I realised that the short git commit hash is used as part of command line commands. Since this requirement does not exist for UUIDs and graphical user interfaces I simply decided to use ellipsis for the abbreviation. Like so 42e9992...

Upvotes: 0

Konrad
Konrad

Reputation: 18595

The question is whether you want to show part of the UUID or only ensure that unique strings are presented as shorter unique strings. If you want to focus on the latter, which appears to be the goal you are suggesting in your opening paragraph:

(...) While not perfect this shorter version is sufficient to look unique in the context where it is displayed. (...)

you can make use of hashing.

Hashing:

Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value.

Hashing is very common and easy to use across many of popular languages; simple approach in Python:

import hashlib
import uuid
encoded_str = uuid.UUID('42e9992a-8324-471d-b7f3-109f6c7df99d').bytes
hash_uuid = hashlib.sha1(encoded_str).hexdigest()
hash_uuid[:10]
'b6e2a1c885'

Expectedly, a small change in string will result in a different string correctly showing uniqueness.

# Second digit is replaced with 3, rest of the string remains untouched 
encoded_str_two = uuid.UUID('43e9992a-8324-471d-b7f3-109f6c7df99d').bytes
hash_uuid_two = hashlib.sha1(encoded_str_two).hexdigest()
hash_uuid_two[:10]
'406ec3f5ae'

Upvotes: 2

Related Questions