Reputation: 16599
I have an aggregations table in DynamoDb with the following columns: id
, sum
, count
, max
, min
, and hash
. I will ALWAYS want to update sum
and count
but will want to update min
and max
only when I have values greater than/lesser than the values already in the database. Also, I only want this operation to succeed when the stored hash
is different from what I am sending, to prevent reprocessing the same data.
I currently have these:
UpdateExpression: ADD sum :sum ADD count :count SET hash :hash
UpdateCondition: attribute_not_exists(hash) OR hash <> :hash
The thing is that I need something like this for min
and max
:
SET min :min IF :min < min
and something alike for max. Of course, this doesn't currently work. I could not find a suitable update function that would perform this comparision in DynamoDb. What is the proper way to achieve this.
PS.: I already was suggested doing multiple requests to dynamodb and place the max/min as UpdateCondition
s, but I want to avoid these multiple requests approach for data consistency reasons.
PS2.: Another way to express what I want in a JavaScript-sh way would be something like SET :min < min ? :min : min
Upvotes: 3
Views: 2703
Reputation: 9315
I had a similar situation where I needed to atomically update a min
value, and ended up doing this:
Let each item have an attribute of type Set (NS
) keeping the candidate values for the min
value, and when you want to set a new value that might be the new min
, just add it to the set. Then at read time, find the lowest number in the set on the client side.
This is atomic and requires no condition expression, but has the downside that the set grows over time, so I added a clean up request to run as needed, for example when the set has more than N values, or simply on every get. The clean up might need to use a condition expression to be concurrent safe though, depending on if you also remove values through other use cases. This does not solve all scenarios, but worked for me. In my case the value was a timestamp of an event in the future, and I wanted to store when the next event occurs. I could then easily also clean up by removing all values in the past.
Summary:
Set new potentially minimum value: ADD #values :value
.
Read minimum value: GetItem followed by finding the lowest value in values
client-side. This could if needed be combined with a clean up that finds all obsolete values, then calls UpdateItem DELETE #values [x, y, z...]
Upvotes: 0
Reputation: 16599
I got to a solution to this problem by realizing that what I wanted was just not possible. There must be just one condition to the entire update and since there is no such thing as SET min = minimum(:min, min)
I had to accept my fate and make more than one UpdateItem
request to DynamoDB.
The nice thing is that the order of execution of these updates doesn't matter. The hard thing here is to make sure that each update is executed exactly once. Because we are firing a lot of requests (and having peaks eventually) there is a real chance of some failing updates due to ProvisionedThroughputExceededException
or maybe just some rate limiting from AWS.
So here is my final solution;
{id, sum, count, min, max}
.{UpdateExpression: 'ADD #SUM :sum, #COUNT :count'}
{ConditionExpression: '#MAX < :max OR attribute_not_exists(#MAX)', UpdateExpression: 'SET #MAX = :max'}
{ConditionExpression: '#MIN > :min OR attribute_not_exists(#MIN)', UpdateExpression: 'SET #MIN = :min'}
.updateItem
.At the end of this process, I was able to do real-time aggregations for thousands of records :)
PS.: Got rid of the hash column
Upvotes: 4
Reputation: 3594
It is not possible to do this in a single update since UpdateExpression doesn't support functions like max()
and min()
. The documentation for supported operations and functions can be found here
The best way to achieve the same effect is to add a field called latest
or something similar which stores the latest value. You will need to change your update expression to be something like the following.
UpdateExpression: SET hash = :hash, latest = :latest, sum = sum + :latest, count = count + :num
Where :hash
is of course your update hash to guard against replays, :latest
is the latest value, and :num
is 1 or whatever your increment is.
Then you can use DynamoDB Streams with a Lambda that looks at each update and checks if latest
is less than min
or greater than max
. If not, ignore the update, otherwise perform a second update to set min
or max
to the latest
value accordingly.
The main drawback to this approach is that there will be a small window where latest
might be outside of the range of min
or max
however, this can be normalized easily in your application code when you read the records.
You should also consider the additional cost that will result from the DynamoDB Stream and Lambda invocations
Upvotes: 0