rafał stempowski
rafał stempowski

Reputation: 15

Why Scanning GSI on DynamoDb doesnt work as fast as expected when using CONTAINS?

I'm new to DynamoDb. So if someone could explain to me, what am I doing wrong, or what I miss in my understanding that would be great. I'm trying to get the most effiicient way of searching for a row, that contains some value. 'm playing around with some test data to see how it works and how to design everything.

I have a table with about 1700 rows. Some rows have quite some data in them. There is PK - Id, And some other attributes like Name, Nationality, Description etc. I also added GSI on 'Name' With projection type 'KEYS_ONLY'

Now, my scenario is to find a person, that name contains given string. Let's say Name is 'Pablo Picasso', and I want to find any 'Picasso' My assumtion was, that if I am scanning the GSI it should be pretty fast, I understand, Scan can only go thorugh !mb of data, but I assumed, that My GSI looked something like this:

Name. Id
A Hopper 2
Timoty c 3
Donald Duck 14

Having that in mind, I was sure it should find my row on first scan. Unfortunetaly my first scan went only through like 340 rows. I was able to find my row after 4 calls to Dynamo. When I made simillar scan, but not on the GSI it took 5 calls. which doesn't seem like that different.

Am I doing something wrong? Or do I missunderstood anything?

For testing purposes I'm using C# code like this:

var result = await _dynamoDb.ScanAsync(new ScanRequest(DynamoConstants.ArtistsTableName)
    {
        IndexName = "NameIndex",
        FilterExpression = "contains(#Name, :name)",
        ExpressionAttributeNames = new Dictionary<string, string>() { { "#Name", "name" } },
        ExpressionAttributeValues = new Dictionary<string, AttributeValue>()
            { { ":name", new AttributeValue("Picasso") } }
});

My index looks like this:

 var nameIndex = new GlobalSecondaryIndex
        {
            IndexName = "NameIndex",
            ProvisionedThroughput = new ProvisionedThroughput
            {
                ReadCapacityUnits = 5,
                WriteCapacityUnits = 5
            },
            Projection = new Projection { ProjectionType = "KEYS_ONLY" },
            KeySchema = new List<KeySchemaElement> {
                new() { AttributeName = "name", KeyType = "HASH"}
            }
        };

EDIT: I did some more digging and found out, that in fact GSI size is the same as the whole table.

...
        "TableSizeBytes": 5435537,
        "ItemCount": 1792,
        "TableArn": "arn:aws:dynamodb:ddblocal:000000000000:table/artists",
        "GlobalSecondaryIndexes": [
            {
                "IndexName": "NameIndex",
                "KeySchema": [
                    {
                        "AttributeName": "name",
                        "KeyType": "HASH"
                    }
                ],
                "Projection": {
                    "ProjectionType": "KEYS_ONLY"
                },
                "IndexStatus": "ACTIVE",
                "ProvisionedThroughput": {
                    "ReadCapacityUnits": 5,
                    "WriteCapacityUnits": 5
                },
                "IndexSizeBytes": 5435537,
                "ItemCount": 1792,
       .....

But why? Is there anything wrong with my Index creation?

Upvotes: 1

Views: 57

Answers (1)

rafał stempowski
rafał stempowski

Reputation: 15

Ok, I know what the issue was. The issue was, that local version of dynamo db is apparently bad, and it don't represent DynamoDb with how GSI works.

I've added same table with same indexes on actuall DynamoDb on AWS, and there is a proper size difference between the table and index. Exactly how I assumed it would look.

Upvotes: 0

Related Questions