Gabriel Cunha
Gabriel Cunha

Reputation: 463

Why DynamoDB scan with Limit and FilterExpression not return the items that match the filter requirements?

I need make a scan with limit and a condition on DynamoDB.

The docs says:

In a response, DynamoDB returns all the matching results within the scope of the Limit value. For example, if you issue a Query or a Scan request with a Limit value of 6 and without a filter expression, DynamoDB returns the first six items in the table that match the specified key conditions in the request (or just the first six items in the case of a Scan with no filter). If you also supply a FilterExpression value, DynamoDB will return the items in the first six that also match the filter requirements (the number of results returned will be less than or equal to 6).


The code (NODEJS):

var params = {
    ExpressionAttributeNames: {"#user": "User"},
    ExpressionAttributeValues: {":user": parseInt(user.id)},
    FilterExpression: "#user = :user and attribute_not_exists(Removed)",
    Limit: 2,
    TableName: "XXXX"
};

DynamoDB.scan(params, function(err, data) {
    if (err) {
        dataToSend.message = "Unable to query. Error: " + err.message;
    } else if (data.Items.length == 0) {
        dataToSend.message = "No results were found.";
    } else {
        dataToSend.data = data.Items;
        console.log(dataToSend);
    }
});



Table XXXX definitions:


In code above, if I remove the Limit parameter, DynamoDB will return the items that match the filter requirements. So, the conditions are ok. But when I scan with Limit parameter, the result is empty.

The XXXX table, has 5 items. Only the 2 firsts have the Removed attribute. When I scan without Limit parameter, DynamoDB returns the 3 items without Removed attribute.

What i'm doing wrong?

Upvotes: 20

Views: 42090

Answers (4)

Naween Niroshan
Naween Niroshan

Reputation: 598

Small hack - Iterate till you get the results

lastEvaluatedKey = null;
do {

    if(lastEvaluatedKey != null) {
        // query or scan data with last evaluated key 
    } else {
        // query or scan data WITHOUT last evaluated key 
    }        

    lastEvaluatedKey == key of last item retrieved

} while(lastEvaluatedKey != null && retrievedResultSize == 0); // == 0 or < yourLimit

If the number of items retrieved is 0 and lastEvaluatedKey is not null that means it has scanned or queried the number of rows which match to your limit. (and result size is zero because they didn't match the filter expression)

Upvotes: -1

Andy Brand
Andy Brand

Reputation: 11

You might be able to get what you need by using a secondary index. Using the classic RDB example, customer - order example: you have one table for customers and one for orders. The Orders table has a Key consisting of Customer - HASH, Order - RANGE. So if you wanted to get the latest 10 orders, there would be no way to do it without a scan

But if you create a Global Secondary Index on orders of "Some Constant" -- HASH, Date RANGE, and queried against that index, they query would do what you want and only charge you for the RCUs involved with the records returned. No expensive scan needed. Note, writes will be more expensive, but in most cases, there are many more reads than writes.

Now you have your original problem if you want to get the 10 biggest orders for a day larger than $1000. The query would return the last 10 orders, and then filter out those less than $1000.

In this case, you could create a computed key of Date-OrderAmount, and queries against that index would return what you want.

It's not as simple as SQL, but you need to think about access patterns in SQL too. If if you have a lot of data, you need to create Indexes in SQL or the DB will happily to table scans on your behalf, which will impair performance and raise your costs.

Note that everything I proposed is normalized in the sense that there is only one source of truth. You are not duplicating data -- you are merely recasting views of it to get what you need from DynamoDB.

Bear in mind that the CONSTANT as a HASH s subject to the 10GB per partition limit, so you would need to design around it if you had a lot of active data. For example, depending on your expected access pattern, you could use Customer and not a constant as a HASH. Or use STreams to organize the data (or subsets) in other ways.

Upvotes: 1

Samuel Okpapi
Samuel Okpapi

Reputation: 51

Also ran into this issue, i guess you will just have to scan the whole table to a max of 1 MB

Scan The result set from a Scan is limited to 1 MB per call. You can use the LastEvaluatedKey from the scan response to retrieve more results.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html

Upvotes: 5

Mark B
Mark B

Reputation: 201138

From the docs that you quoted:

If you also supply a FilterExpression value, DynamoDB will return the items in the first six that also match the filter requirements

By combining Limit and FilterExpression you have told DynamoDB to only look at the first two items in the table, and evaluate the FilterExpression against those items. Limit in DynamoDB can be confusing because it works differently from limit in a SQL expression in a RDBMS.

Upvotes: 28

Related Questions