Renato Gama
Renato Gama

Reputation: 16599

How to conditionally execute a SET operation in DynamoDB

I have an aggregations table in DynamoDb with the following columns: id, sum, count, max, min, and hash. I will ALWAYS want to update sum and count but will want to update min and max only when I have values greater than/lesser than the values already in the database. Also, I only want this operation to succeed when the stored hash is different from what I am sending, to prevent reprocessing the same data.

I currently have these:

UpdateExpression: ADD sum :sum ADD count :count SET hash :hash
UpdateCondition: attribute_not_exists(hash) OR hash <> :hash

The thing is that I need something like this for min and max: SET min :min IF :min < min and something alike for max. Of course, this doesn't currently work. I could not find a suitable update function that would perform this comparision in DynamoDb. What is the proper way to achieve this.

PS.: I already was suggested doing multiple requests to dynamodb and place the max/min as UpdateConditions, but I want to avoid these multiple requests approach for data consistency reasons.

PS2.: Another way to express what I want in a JavaScript-sh way would be something like SET :min < min ? :min : min

Upvotes: 3

Views: 2703

Answers (3)

JHH
JHH

Reputation: 9315

I had a similar situation where I needed to atomically update a min value, and ended up doing this:

Let each item have an attribute of type Set (NS) keeping the candidate values for the minvalue, and when you want to set a new value that might be the new min, just add it to the set. Then at read time, find the lowest number in the set on the client side.

This is atomic and requires no condition expression, but has the downside that the set grows over time, so I added a clean up request to run as needed, for example when the set has more than N values, or simply on every get. The clean up might need to use a condition expression to be concurrent safe though, depending on if you also remove values through other use cases. This does not solve all scenarios, but worked for me. In my case the value was a timestamp of an event in the future, and I wanted to store when the next event occurs. I could then easily also clean up by removing all values in the past.

Summary: Set new potentially minimum value: ADD #values :value.

Read minimum value: GetItem followed by finding the lowest value in values client-side. This could if needed be combined with a clean up that finds all obsolete values, then calls UpdateItem DELETE #values [x, y, z...]

Upvotes: 0

Renato Gama
Renato Gama

Reputation: 16599

I got to a solution to this problem by realizing that what I wanted was just not possible. There must be just one condition to the entire update and since there is no such thing as SET min = minimum(:min, min) I had to accept my fate and make more than one UpdateItem request to DynamoDB.

The nice thing is that the order of execution of these updates doesn't matter. The hard thing here is to make sure that each update is executed exactly once. Because we are firing a lot of requests (and having peaks eventually) there is a real chance of some failing updates due to ProvisionedThroughputExceededException or maybe just some rate limiting from AWS.

So here is my final solution;

  1. Lambda function receives payload with hundreds of data points.
  2. Lambda function aggregates this data points in memory and produces an intermediary aggregation object of the form {id, sum, count, min, max}.
  3. Lambda function generates 3 update objects per aggregation object, of the forms (these updates are referring to the same record):
    1. {UpdateExpression: 'ADD #SUM :sum, #COUNT :count'}
    2. {ConditionExpression: '#MAX < :max OR attribute_not_exists(#MAX)', UpdateExpression: 'SET #MAX = :max'}
    3. {ConditionExpression: '#MIN > :min OR attribute_not_exists(#MIN)', UpdateExpression: 'SET #MIN = :min'}
  4. Because we need to be 100% sure that these updates will always be processed with success, then the lambda function sends them to a FIFO SQS queue (as 3 separate messages). I am not using a FIFO queue here because I want the order to be preserved but because I want the guarantee of exactly once delivery.
  5. A consumer keeps pooling the queue and whenever there are messages it just shoots them to DynamoDB as the parameter of .updateItem.

At the end of this process, I was able to do real-time aggregations for thousands of records :)

PS.: Got rid of the hash column

Upvotes: 4

tleef
tleef

Reputation: 3594

It is not possible to do this in a single update since UpdateExpression doesn't support functions like max() and min(). The documentation for supported operations and functions can be found here

The best way to achieve the same effect is to add a field called latest or something similar which stores the latest value. You will need to change your update expression to be something like the following.

UpdateExpression: SET hash = :hash, latest = :latest, sum = sum + :latest, count = count + :num

Where :hash is of course your update hash to guard against replays, :latest is the latest value, and :num is 1 or whatever your increment is.

Then you can use DynamoDB Streams with a Lambda that looks at each update and checks if latest is less than min or greater than max. If not, ignore the update, otherwise perform a second update to set min or max to the latest value accordingly.

The main drawback to this approach is that there will be a small window where latest might be outside of the range of min or max however, this can be normalized easily in your application code when you read the records.

You should also consider the additional cost that will result from the DynamoDB Stream and Lambda invocations

Upvotes: 0

Related Questions