Martin
Martin

Reputation: 1634

Query by tags in Azure Table Storage

What is the most efficient way to query lots of rows based on tags in Azure Table Storage?

I need to find all the rows that has been tagged with at least "A" AND "B" tags. The tags will be normalized before insert, eg. all lowercase and alphanumeric only. I also need to easily pull out the latest items, hence my current RowKeycontains an inverted DateTime as string.Format("{0:D19}", DateTime.MaxValue.Ticks-DateTime.UtcNow.Ticks)

All my rows are partitioned by a UserId

My current solution is adding a extra row for each tag (thinking about putting them in a separate table), then I can fire off multiple queries and join them on my server before returning them to the client. So when inserting a entity that have two tags (A, B) it will be stored like this:

[
    {
        "partitionKey": "123", // UserId
        "rowKey": "2519633682044852820"
    },
    {
        "partitionKey": "123_tags", // UserId + tags constant, this is not the actual tags - this is to put the tagged entities in a different partition
        "rowKey": "a" // normalized tag
    },
    {
        "partitionKey": "123_tags",
        "rowKey": "b"
    }
]

The queries I need, would be something like this:

Upvotes: 3

Views: 644

Answers (2)

Zhaoxing Lu
Zhaoxing Lu

Reputation: 6467

Your current design looks good to me, and it complies with the guidelines in Azure Storage Table Design Guide. The 2 queries you mentioned can be done efficiently in the table designed by you. Do you have further questions?

Upvotes: 1

alek kowalczyk
alek kowalczyk

Reputation: 4936

It depends on how much data you will have for a single user. Your first query is straightforward as the partition key is the user id. When it comes to the second query, as there is no "contains" functionality in table storage, I would say you have following options:

  • Get all items for the user, and filter out unnecessary records later in code (if there aren't that much records for each user it shouldn't be a problem)
  • If to much records for a single user, you can partition records for a user based on some tag range, i.e. the partition key can look like "UserX_TagsA-K", "UserX_TagsK-O", "UserX_TagsO-Z" and then get only from the needed tange and filter out later on in code
  • Make redundand records for each tag, then the partition key can look like "UserX_Tag123" for one record, and "UserX_Tag324" for the same record, but indicating that it has two tags. If the data needs to be updated, the row key can be a composite one, i.e. in following format "GroupId:UniqueId" and by using "StartsWith" clause on the rowkey, you can get all redundand records to update them

Either way, the Table Storage architecture doesn't make such use case easy and you will need to implement some logic in code.

On a side note, IIRC you can get the latest items using the default TimeStamp field.

Upvotes: 1

Related Questions