rb8680
rb8680

Reputation: 279

Java UUID Compress & Decompress back

I would like to do the following...

a) Compress a generated UUID to String of length 8.

b) Decompress the compressed UUID back to the original UUID.

The reason is because I have to send the UUID to a partnering system, and the partnering system only accepts 8 chars for UUID, and no I cannot request for a change to the partnering system.

So, what is left to do is to compress UUID that I have to 8 char string and then decompress it back to the original UUID when a message is gotten back from the partnering system.

Any ideas?

Thanks.

Upvotes: 5

Views: 12326

Answers (3)

xjodoin
xjodoin

Reputation: 549

The best way to achieve url safe uuid compression is to encode it in base64

public class UUIDUtils {

  public static String compress(UUID uuid) {
    ByteBuffer bb = ByteBuffer.allocate(Long.BYTES * 2);
    bb.putLong(uuid.getMostSignificantBits());
    bb.putLong(uuid.getLeastSignificantBits());
    byte[] array = bb.array();
    return Base64.getEncoder().encodeToString(array);
  }

  public static UUID decompress(String compressUUID) {
    ByteBuffer byteBuffer = ByteBuffer.wrap(Base64.getDecoder().decode(compressUUID));
    return new UUID(byteBuffer.getLong(), byteBuffer.getLong());
  }


}

Result: 6227185c-b25b-4497-b821-ba4f8d1fb9a1 -> YicYXLJbRJe4IbpPjR+5oQ==

Upvotes: 6

obataku
obataku

Reputation: 29646

You can convert the UUID into a String which is really a sequence of 16-bit char 8 elements long as follows.

static String encodeUuid(final UUID id) {
  final long hi = id.getMostSignificantBits();
  final long lo = id.getLeastSignificantBits();
  return new String(new char[] {
    (char) ((hi >>> 48) & 0xffff), (char) ((hi >>> 32) & 0xffff),
    (char) ((hi >>> 16) & 0xffff), (char) ((hi       ) & 0xffff),
    (char) ((lo >>> 48) & 0xffff), (char) ((lo >>> 32) & 0xffff),
    (char) ((lo >>> 16) & 0xffff), (char) ((lo       ) & 0xffff)
  });
}

static UUID decodeUuid(final String enc) {
  final char[] cs = enc.toCharArray();
  return new UUID(
    (long) cs[0] << 48 | (long) cs[1] << 32 | (long) cs[2] << 16 | (long) cs[3],
    (long) cs[4] << 48 | (long) cs[5] << 32 | (long) cs[6] << 16 | (long) cs[7]
  );
}

This code indeed seems like it should work (try it yourself here), and can be encoded/decoded using both UTF-8 and UTF-16 without issue the majority of the time:

static boolean validate(final UUID id, final Charset cs) {
  final ByteBuffer buf = cs.encode(encodeUuid(id));
  final UUID _id = decodeUuid(cs.decode(buf).toString());
  return id.equals(_id);
}

public static void main(final String[] argv) {
  final UUID id = UUID.randomUUID();
  assert validate(id, StandardCharsets.UTF_8)  : "failed using utf-8";
  assert validate(id, StandardCharsets.UTF_16) : "failed using utf-16";
}

C:\dev\scrap>javac UuidTest.java

C:\dev\scrap>java -ea UuidTest

However there is indeed the problem that some UTF-16 code points are reserved as surrogates. In the case this happens, the encoding will not work and you will be unable to reconstruct the original UUID. Refer to Mechanical snail's response above for more information on that.


The only data you can consistently actually remove from an encoded UUID generated via UUID.randomUUID are those 2 used for variant (always 2) and the 4 bits used for version (always 4).

There exist different variants of these global identifiers. The methods of this class are for manipulating the Leach-Salz variant, although the constructors allow the creation of any variant of UUID (described below).

The layout of a variant 2 (Leach-Salz) UUID is as follows: The most significant long consists of the following unsigned fields: 0xFFFFFFFF00000000 time_low
0x00000000FFFF0000 time_mid
0x000000000000F000 version
0x0000000000000FFF time_hi

The least significant long consists of the following unsigned fields: 0xC000000000000000 variant
0x3FFF000000000000 clock_seq
0x0000FFFFFFFFFFFF node

The variant field contains a value which identifies the layout of the UUID. The bit layout described above is valid only for a UUID with a variant value of 2, which indicates the Leach-Salz variant.

The version field holds a value that describes the type of this UUID. There are four different basic types of UUIDs: time-based, DCE security, name-based, and randomly generated UUIDs. These types have a version value of 1, 2, 3 and 4, respectively.

Upvotes: 0

Mechanical snail
Mechanical snail

Reputation: 30637

What you ask is impossible for information-theoretic reasons.

UUIDs as specified by RFC 4122 are 128 bits, as are UUID objects in Java.

Java Strings can store 16 bits per character, which would make for an 8-char string. However, not all bit sequences are valid UTF-16 strings, so in 8 characters you can store fewer than 128 bits of information.

So if you compress a UUID to a valid 8-character string, you have lost information, so in general there's no way to decompress it to retrieve the original UUID back.

What you might have intended is to generate a shorter string to use as a unique identifier. If so, see Generating 8-character only UUIDs.

Upvotes: 11

Related Questions