Nithin Mohan
Nithin Mohan

Reputation: 770

What is the best way of creating Partition key in azure table storage for storing sensor output data?

I searched best practice for storing sensor output data in azure table storage but didn't get best answer. I am currently working on a project that consists of storing sensor data to azure table storage. Currently I am using partition key as Sensor ID . Every second I am storing the sensor outputs. About 100 sensors are currently using. So imagine large data is storing every day. So I am getting slow performance in my web application when i searched particular sensor data by date wise. Is there a better way to improve the performance of the web app? How about changing sensor id to date as partition key? Code is not important here. I need a logical solution.. May be this question will help lot of developers who are working on such scenario..

UPDATE

Each sensor provides a 10 different outputs and date which is the output datetime. So they are in a same row of each sensor id. And I am taking sensor data using Date range and Sensor id

Partition key - sensor id , RowKey - datetime , 10 output columns and output date

here is my code

var query = TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, sensorID);
var dateFilter = TableQuery.CombineFilters(
            TableQuery.GenerateFilterConditionForDate("outputdate", QueryComparisons.GreaterThanOrEqual, Convert.ToDateTime(from)),
            TableOperators.And,
            TableQuery.GenerateFilterConditionForDate("outputdate", QueryComparisons.LessThanOrEqual, Convert.ToDateTime(to))
            );
            query = TableQuery.CombineFilters(query, TableOperators.And, dateFilter);
var rangeQuery = new TableQuery<TotalizerTableEntity>().Where(query);

        var entitys = table.ExecuteQuery(rangeQuery).OrderBy(j => j.date).ToList();

outputdate indicates output generated time. This is getting as datetime. All output have same output time.

Upvotes: 3

Views: 1733

Answers (1)

Gaurav Mantri
Gaurav Mantri

Reputation: 136196

First, I would highly recommend that you read Azure Storage Table Design Guide: Designing Scalable and Performant Tables. This will give you a lot of ideas about how to structure your data.

Now coming to your current implementation. What I am noticing is that you're including PartitionKey in your query (which is very good BTW) but then adding a non-indexed attribute (outputdate) in your query as well. This will result in what is known is Partition Scan. For larger tables, this will create a problem because your query will be scanning the entire partition for matching outputdate attribute.

You mentioned that you're storing datetime value is RowKey. Assuming the RowKey value matches with the value of output date, I would recommend using RowKey in your query instead of this non-indexed attribute. RowKey (along with PartitionKey) are the only two attributes that are indexed in a table, so the query will be comparatively much faster.

When saving date/time as RowKey, I would recommend converting it into ticks (DateTime.Ticks) and then saving that instead of simply converting the date/time value to string. If you're going with this approach, I would suggest prepending 0 in front of this ticks so that all values are of same length (doing something like DateTime.Ticks.ToString("d19")).

You can also save the RowKey as Reverse Ticks i.e. (DateTime.MaxValue.Ticks - DateTime.Ticks).ToString("d20"). This will ensure that all the latest entries get added to the top of the table instead of at the bottom. This will help in scenario where you are more interested in querying the latest records.

If you will always query for a particular sensor, it may not hurt to save data for each sensor in a separate table i.e. each sensor gets a separate table. This will free up one key for you. You can use date/time value (which you're currently storing as RowKey) as PartitionKey and can use some other value as RowKey. Furthermore, it will allow you to scale across storage accounts - data for some sensors will go in one storage account while the data for other sensors will go in other storage account. Somewhere you just need to save this relationship so that data reaches correct storage account/table.

Upvotes: 3

Related Questions