Azure Table Storage - RowKey design for ordered data

Question

I store large quantities of GPS data in storage tables. Each partition can reach up to 1 million entities. Each GPS entity that is inserted into the table is actually sequential to the previous data inserted, so order matters.

Sometimes I need to perform the following query:

Get the previous/next 3 GPS entities from the current entity (within the same partition).

Options for RowKey design:

Create an incrementing integer. But how do I track what the current size of the table is? There is no way to get table row count, or to get the last inserted row.
Use DateTime Ticks. But how do get the previous/next entity using ticks?

I'm using the SDK version 2.0 in C#.

Igorek · Accepted Answer

This is a super cool problem to solve.

If your job was to only find entries that are EITHER after or before the given entry, this would be simple, as you could just take the DateTime.Now.Ticks or (DateTime.Max.Ticks - DateTime.Now.Ticks) as the RowKey and use "Take" command on your queries to find the nearest X records.

However, since you need to find nearest locations that are either after or before a given location's ID, here is one design pattern I think may work for your RowKeys:

You will want save two entities for every GPS location. Entities' data should be the same except that the row keys would be different
One entity will have a RowKey of: DateTime.UtcNow.Ticks and have a prefix say: "A" (ascending)
The other entity will have a RowKey of: DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks and have a prefix character, say: "D" (descending)

For example, you have 7 locations, Location1..thru..Location7 I've given them random Tick values from 01 thru 99 (for simplicity's sake). Imagine that MaxTicks is 100. This would make our table contain the following data:

Rowkey, Entity Data

A----01, Location1
A----50, Location2
A----55, Location3
A----66, Location4
A----67, Location5
A----90, Location6
A----99, Location7
D----01, Location7
D----10, Location6
D----33, Location5
D----34, Location4
D----45, Location3
D----50, Location2
D----99, Location1

Now, for every entity, you can easily calculate its "opposite" Rowkey entity. (Simply subtract its RowKey from DateTime.MaxValue.Ticks and flip the prefix from A to D or from D to A)

Thus, if you need 2 entities that are before Location3, simply issue the query to take 2 entities from the table where RowKey is greater than "D----45" and smaller than "D----99" (max). And if you need to take 2 rows after Location3, simply issue a Take 2 where RowKey is greater than "A----55" and less than "A-----99" (max). Issuing the "less than" criteria is important, so that you dont accidentally run into the "D"s when you're querying against "A"s.

Update/Insert both sets of entities within batch transactions to guarantee that both or none make it into the table and "voila".

The downside to this approach is that you will have to pay for twice the storage.

Hope this helps and is not too confusing

Azure Table Storage - RowKey design for ordered data

Answers (1)

Related Questions