Oldrich Svec
Oldrich Svec

Reputation: 4231

windows azure: data storage with append

I plan to run a numerical simulation on Windows Azure. The simulation can take days or weeks. Every second or so the simulation produces a set of numbers like temperature: double, pressure: double, velocity: double[] etc. which I would like to store.

The requirements are:

  1. To save all the data produced every second immediately and preferably in one request.
  2. To be able to read any of the stored data (using e.g. javascript) even during the numerical simulation runtime.
  3. To have temperature, pressure, velocity etc. separate. I would like to read e.g. all the pressures in one call without reading velocities etc.
  4. On a global level, the storage should be split into projects and the projects should contain temperature "files", pressure "files" etc. and each "file" should contain a sequence of numbers.
  5. It should be cheap.
  6. I do not need any advanced features -> it should behave more or less as files in the file-system

Which storage shall I use? Can you point me to a tutorial that discusses such a use-case?

Upvotes: 0

Views: 466

Answers (1)

Gaurav Mantri
Gaurav Mantri

Reputation: 136369

My recommendation would be to use Azure Table Storage for your project. It's "dirt" cheap and is capable of storing massive amounts of data.

Coming to specific requirements:

To save all the data produced every second immediately and preferably in one request.

You could use Entity Group Transactions to store the data in one request. There're some limitations around that so I would recommend that you read up on that.

To be able to read any of the stored data (using e.g. javascript) even during the numerical simulation runtime.

Since Windows Azure Table Storage is a REST based service, you could very well fetch the data using JavaScript as well though I would actually recommend using Shared Access Signatures for querying data as it is much more secure.

To have temperature, pressure, velocity etc. separate. I would like to read e.g. all the pressures in one call without reading velocities etc.

On a global level, the storage should be split into projects and the projects should contain temperature "files", pressure "files" etc. and each "file" should contain a sequence of numbers.

This is where things get interesting. Basically what you're looking to do is de-normalize the data and Azure Table Storage is meant for that. What you call "file", I would call it a "table". So there will be a "temperature" table and "pressure" table and so on. The approach I would recommend is to save the data in a message in a Windows Azure Queue when you first collect it and then have another process (a worker role may be) pull this message and push the data in different tables by transforming the data required for each table.

It should be cheap.

Windows Azure Table Storage is cheap. You basically pay for the amount of data you store, number of transactions you perform against the service and the data which flows out of the data center. Please visit Windows Azure Pricing page for more details.

I do not need any advanced features -> it should behave more or less as files in the file-system

Azure Table Storage is essentially a key-value pair based data store so it's relatively easy to use.

Word of Caution

Azure Table Storage is a bit different than your regular SQL tables in the sense that you don't have the luxury of creating additional indexes (called secondary indexes) on a table. You only get a single index (on PartitionKey/RowKey) on a table. Thus it's very important that you must choose "PartitionKey/RowKey" values very wisely by taking how you're going to read the data back from the table into consideration.

You may find these links useful:

http://blogs.msdn.com/b/windowsazurestorage/archive/2012/11/04/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx

http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx

http://channel9.msdn.com/Events/Build/2012/4-004

Design of Partitioning for Azure Table Storage

Upvotes: 5

Related Questions