Reputation: 901
So I had a request to run a report based of azure table storage, after a long process of refining a report I got the data. However something didn't sit right with me when I finished re-factoring the console app. I haven't worked with azure for very long but I have the basic concept that what you choose to use as the PartitionKey and RowKey will either make or break a table (eventually).
The query I run uses timestamp (there are constraints on why I am using this field) as the filter to pull back a days worth of data, because the PartitionKey and RowKey is unknown. To my understanding this would cause the query to run through the entire table (correct me if I am wrong) if running a query without a PK and RK. Leading to a very poor fetch time for the query.
It made me nervous using timestamp because that belongs to the table and is constantly updated every time something changes for that entry. Now with this in mind the report can take Hours to run. So this leads to my main question.
What happens if in the middle of my query a series of entry is changed mid fetch?
Take this scenario for example:
At the time I access the 50th entry, entry 1-20 is updated and entry 80-100 changes.
What kind of data do I get back? (I would believe I get the update entries for 80-100 but still retain the old data from 1-20).
Upvotes: 0
Views: 1187
Reputation: 2847
Correct me if I am wrong, but running a query without a PK and RK would lead to a very poor fetch time for the query.
This is a serious anti-pattern. The most efficient query is a point query on PK and RK. Providing a PK at least forces the query into one partition or compute node. Providing neither guarantees a full table scan. Like many NoSQL stores, it is crucial to design the data model around query performance. With control over PK & RK, you could have injected the timestamp into these, while remaining aware of another anti-pattern append-only writes into a single partition. This happens, for example, if you base the PK on a daily or hourly bucket and only insert data into a single bucket.
Upvotes: 3