Reputation: 11
For example, consider a composite primary hash and range key table where the hash key represents a device ID, and where device ID "D17" is particularly heavily requested. To increase the read and write throughput for this "hot" hash key, pick a random number chosen from a fixed set (for example 1 to 200) and concatenate it with the device ID (so you get D17.1, D17.2 through D17.200). Due to randomization, writes for device ID "D17" are spread evenly across the multiple hash key values, yielding better parallelism and higher overall throughput.
This strategy greatly improves the write throughput, but reads for a specific item become harder since you don't know which of the 200 keys contains the item. You can improve this strategy to get better read characteristics: instead of choosing a completely random number, choose a number that you are able to calculate from something intrinsic to the item. For example, if the item represents a person that has the device, calculate the hash key suffix from their name, or user ID. This calculation should compute a number between 1 and 200 that is fairly evenly distributed given any set of names (or user IDs.) A simple calculation generally suffices (such as, the product of the ASCII values for the letters in the person’s name modulo 200 + 1). Now, the writes are spread evenly across the hash keys (and thus partitions). And you can easily perform a get operation, because you can determine the hash key you need when you want to retrieve a specific "device owner" value. Query operations still need to run against all D17.x keys, and your application needs some logic on the client side to merge all of the query results for each hash key (200 in this case). But, the schema avoids having one "hot" hash key taking all of the workload.
can anyone please explain what are they saying in above example?
thanks in advance
Al Amin
Upvotes: 0
Views: 159
Reputation: 71384
It is simply a strategy for trying to optimize read/write throughput for a particular highly used hash key. You are basically splitting up one hash key into (in this case) 200 different hash keys in a manner that allow you to both read and write the the desired key based on the calculation of some sort of hash. Really, the hash is needed for reads, so that you can determine what key to request.
Upvotes: 1