Reputation: 1634
What is the most efficient way to query lots of rows based on tags in Azure Table Storage?
I need to find all the rows that has been tagged with at least "A" AND "B" tags. The tags will be normalized before insert, eg. all lowercase and alphanumeric only. I also need to easily pull out the latest items, hence my current RowKey
contains an inverted DateTime
as string.Format("{0:D19}", DateTime.MaxValue.Ticks-DateTime.UtcNow.Ticks)
All my rows are partitioned by a UserId
My current solution is adding a extra row for each tag (thinking about putting them in a separate table), then I can fire off multiple queries and join them on my server before returning them to the client. So when inserting a entity that have two tags (A, B) it will be stored like this:
[
{
"partitionKey": "123", // UserId
"rowKey": "2519633682044852820"
},
{
"partitionKey": "123_tags", // UserId + tags constant, this is not the actual tags - this is to put the tagged entities in a different partition
"rowKey": "a" // normalized tag
},
{
"partitionKey": "123_tags",
"rowKey": "b"
}
]
The queries I need, would be something like this:
top 1000 latest entities where userId = "X"
all entities where userId = "X" and tags contains "A" AND "B"
Upvotes: 3
Views: 644
Reputation: 6467
Your current design looks good to me, and it complies with the guidelines in Azure Storage Table Design Guide. The 2 queries you mentioned can be done efficiently in the table designed by you. Do you have further questions?
Upvotes: 1
Reputation: 4936
It depends on how much data you will have for a single user. Your first query is straightforward as the partition key is the user id. When it comes to the second query, as there is no "contains" functionality in table storage, I would say you have following options:
Either way, the Table Storage architecture doesn't make such use case easy and you will need to implement some logic in code.
On a side note, IIRC you can get the latest items using the default TimeStamp field.
Upvotes: 1