scafrs
scafrs

Reputation: 453

Fetching min records for PK without using scan

I am new developer in Dynamo and I am trying to build a query in dynamo to get each registry with the min value of the number of retries for the PK. For example, in the following data set, I want to get the registry with sk 87bee31f-03da-42f3-92ec-5b8e5e1e726f and registry with sk 87bee31f-03da-42f3-92ec-5b8e5e1e746f.I understand that it is not possible to aggregate in Dynamo and that running the scan is expensive and the performance is not the best. However, for this query I don't know PK's and I haven't found any other way to do this. What is the best way to build this query? It is posssible without scan?

pk=1, sk=87bee31f-03da-42f3-92ec-5b8e5e1e726f, retries=0
pk=1, sk=87bee31f-03da-42f3-92ec-5b8e5e1e727f, retries=2
pk=2, sk=87bee31f-03da-42f3-92ec-5b8e5e1e718f, retries=3
pk=2, sk=87bee31f-03da-42f3-92ec-5b8e5e1e739f, retries=4
pk=2, sk=87bee31f-03da-42f3-92ec-5b8e5e1e746f, retries=1
pk=2, sk=87bee31f-03da-42f3-92ec-5b8e5e1e757f, retries=2
pk=1, sk=87bee31f-03da-42f3-92ec-5b8e5e1e728f, retries=3
pk=1, sk=87bee31f-03da-42f3-92ec-5b8e5e1e729f, retries=4

Upvotes: 0

Views: 58

Answers (2)

Mazen Al Khatib
Mazen Al Khatib

Reputation: 308

In general, you use DynamoDB when you know your access patterns and your access patterns play an important role to decide what are your PKs and SKs and whether you need GSI or LSI keeping in mind the cost that comes with that and the possibility to create the index after the table is already created.

That being said, I am not sure I got what you mean by you don't know the PK (Like where it's coming from when storing the data?) but the rule is:

You want to do query operation? You MUST have the PK otherwise, you do scan.

In your example I would either create a GSI and the PK would be the data that I have (I guess in your case 87bee31f-03da-42f3-92ec-5b8e5e1e726f) and I will do multiple queries for each registry.

Or if you have a low number of max retries, let's say it's 5 then my GSI will have the retries as PK and I will do maximum of 5 queries in this case. Now this will may get you a lot of data, but you can filter with this data on hand in your application without risking the cost and capacity units of the DynamoDB

Upvotes: 0

Leeroy Hannigan
Leeroy Hannigan

Reputation: 19893

I would suggest creating a Global Secondary Index like the following:

SK retries
87bee31f-03da-42f3-92ec-5b8e5e1e727f 2
87bee31f-03da-42f3-92ec-5b8e5e1e746f 1
87bee31f-03da-42f3-92ec-5b8e5e1e757f 2

Now you can really quickly and efficiently obtain the registries for you items. Do obtain it in a single request use PartiQL ExecuteStatement which allows up to 50 keys per request:

SELECT * FROM mutable.myindex 
WHERE SK IN ['87bee31f-03da-42f3-92ec-5b8e5e1e727f','87bee31f-03da-42f3-92ec-5b8e5e1e757f']

You don't require a sort key for your index, or perhaps you could use a timestamp of you're only interested in items that are time relevant. I used retries as your sort key just as an example.

Upvotes: 0

Related Questions