DevDev
DevDev

Reputation: 313

Service Fabric - How could we generate a partitionKey?

I have a stateful service with a range of partitions keys going from
-9223372036854775808 to 9223372036854775807 (UniformInt64Partition).

How can I generate an adequate partition key when calling the service in order to improves the distribution of workloads across all the partitions ?

Thank u

Upvotes: 0

Views: 586

Answers (2)

Michael Meadows
Michael Meadows

Reputation: 28416

If you already use a GUID as a key to identify your data, this isn't hard to do. The key to know is that GUIDs, while (practically) globally unique, are not even close to evenly distributed across a range. I use the SHA1 hashing algorithm to hash the GUID, because despite its shortcomings as a cryptographic algorithm, it does a good job of generating evenly distributed hashes without demanding too much of the server (in terms of compute and RAM).

As a side note, by going from GUID to long, you're creating data loss (GUIDs are the equivalent of a 128 bit integer). Since the goal is to distribute data across partitions, this is okay... don't sweat the little things. You could, in fact, use a smaller range than Int64, but if you already have a GUID, then why bother.

See the code before for an extension method to create a partition key from GUID. My implementation code collapses it to two lines, but I broke it out below so that I could annotate it.

public static ServicePartitionKey ToPartitionKey(this Guid id)
{
    // Hash algorithms need byte arrays, so we're converting the Guid here
    byte[] guidBytes = id.ToByteArray();

    // SHA1 is light weight and good at creating distribution across the range.
    // Do not use for encryption!
    SHA1CryptoServiceProvider hasher = new SHA1CryptoServiceProvider();

    // Hash the Guid's bytes.
    byte[] hashedBytes = hasher.ComputeHash(guidBytes);

    // Now that our data is repeatibly but distributed evenly, we make it a long
    long guidAsLong = BitConverter.ToInt64(hashedBytes, 0);

    // return the partition key
    return new ServicePartitionKey(guidAsLong);
}

Upvotes: 1

Diego Mendes
Diego Mendes

Reputation: 11351

For this large range of partition keys, the best approach is using a hashing algorithm on top of a field or collection of fields to generate a key(number) with as least collision as possible.

Assuming you are storing a customer information, as example, a hash for the customer name from "John Smith" could generate a hash value of 32, because any user with same name as "John Smith" will generate the same hash, if it is not frequent, wouldn't be a problem, because 32 is not an id and they can be repeated, having the same hash they would be stored on same partition.

If you really want to distribute these values as even as possible, you can use another field concatenated to differentiate "John Smith" from "John Smith", like date of birth, And unless both born on same date, you will find different values for each one.

In your case, because the range is very large, you have to use a hashing algorithm to hash these values to fit the range of -9223372036854775808 to 9223372036854775807.

Do you need that many keys?

If your system does not expect to have a very high number of partitions, an easy way to manage this is using a natural number that closely reflects the range of keys provided by your hashing function chosen, you might decide to chose one with better performance, or lower collision, or both.

Upvotes: 2

Related Questions