Reputation: 20445
We've been using a hybrid architecture on Windows Azure, storing most entities in a SQL Azure database, but throwing anything that's likely to require significant amounts of storage space into Azure Table Storage.
With this architecture, though, we're running into all sorts of problems with Azure Table Storage, which strikes me as an immature and incomplete product at best. The biggest limitation is that, for all practical purposes, it's a write-only data store. The consensus is that its write capabilities scale very, very well, but its querying and indexing capabilities are so astonishingly limited (despite years of users complaining and Microsoft promising) that I've come to the conclusion you should basically only ever try to retrieve data out of ATS in an emergency. Getting data out of it for a complex, realtime, transactional production app is way more difficult than it should be. There are workarounds, of course, like maintaining multiple copies of data, with different indexing strategies for each copy, or splitting up your queries and running them in parallel, but that's adding complexity when the whole point of a cloud service is to minimize it.
That said, we're committed to Azure for now, and I would like to have a good sense for what the alternatives and pitfalls are, preferably from folks that have actually been down this road in production.
I'm quite well aware that there are lots of NoSQL options out there (e.g., all the ones listed in this question: What NoSQL solutions are out there for .NET?) that I can run either on a VM or in some other cloud. But I'm specifically interested in knowing whether there are any that fit well into Azure's PAAS model. In other words, if I'm on Azure, and don't want to manage my own VM's, and want something as close as possible to the almost automatic and nearly infinite scalability promised (though never quite delivered) by ATS, what options have people found valuable? Is the MongoDB/Azure wrapper a simple and viable alternative? Or should I just bite the bullet and spin up my own VM's? Or switch over to AWS? Or stick with Azure SQL?
(To give you a sense of our size requirements: we're thinking we'll be needing to store upwards of a billion rows. Not huge, but not negligible either.)
Upvotes: 6
Views: 4486
Reputation: 1715
Azure have made good strides in NoSQL since this was posted. You can now spin up Raven and MongoDB as addons from within Azure, and they recently announced "Azure DocumentDB", their own offering, . It's in public preview - blog is here: http://azure.microsoft.com/blog/2014/08/21/new-azure-services-and-updates-expand-openness-choice-and-flexibility/
Further information and documentation is available here: http://azure.microsoft.com/en-gb/services/documentdb/
As the others have mentioned, Lucene as a possible search/index solution. I have a website on Azure Websites that use a Lucene index for the search, and I have been able to store and query the index directly on the website's webspace, so didn't need a dedicated VM or to worry about how I expose the index across the wire. Obviously this can get tricky if you want to maintain multiple web boxes (when scaling), but it may be worth you knowing as a possibility. My web instance came with 50GB of disk space, of which only a small portion is used by the website, so the Lucene index puts it to use. I've never heard of this being an official strategy, YMMV.
Upvotes: 1
Reputation: 410
Maybe little bit off topic.
There are several use cases, where ATS is great tool.
One case is storing meta data you usually store as XML (JSON) serialized objects within your regular RDB. These are data, which don't need indexing over, but are structured. For example all client meta data. The reason to use rather ATS than SQL is the ability of the ATS to add, remove column of such data on the go. So whenever you change the meta data structure you don't need to loop through all client records, deserialize the XML (JSON), recreate the data tree, serialize it into XML (JSON) and store it back into the table. This is perfect. The down side of the coin is you must keep flat structure of the meta data instead of tree structure you can achieve using classic XML (JSON) serialization.
Second case is storing the data from your RDBM you don't need in case there are too many of them. It could be for example list of transactions in banking system older than 5 years. These are the data you actually need to store, but not in active form. These data would slow down your joins/where conditions and you don't need them at daily base. You can still get these data back or move them into another RDBM for offline analysis made once per year. Storing the data in ATS is also much more cheaper than leaving them inside of RDBM.
Upvotes: 1
Reputation: 251
Although Azure table storage does not support secondary indexes, and does not have the feature set of SQL, it is not trying to solve the same problem.
I would avoid SQL Azure (or whatever it's called now) and focus on building a data layer that uses what Azure is good at (blobs, tables, and queues).
We have found table storage to be more than adequate for a large production solution. It has gotten a lot better over the last 18 months or so. The v2 of the .NET client library is much better than v1.
As with most applications, a direct port of the architecture onto a cloud platform is rarely a good idea. Rethinking the way that you have solved previous business problems with a solid understanding of what's available in the cloud is the only path to success.
I agree with a previous post that something like Lucene could be good if you need to index a lot of data. We find that using tables and blobs well we are able to make do without, but it's definitely an option in your toolbox.
Upvotes: 3
Reputation: 1008
We have gone through a similar situation and have researched several options, which offers Azure and nosql options.
The measure we have taken has been to use Azure Blob Storage and Lucene.Net. We serialize our objects in Json and then save them in AzureBlobs.
We use Lucene.Net to create indexes, Lucene.Net returns the data we need to get the blobs that contain the data we want to search. We do not have a development in production yet with this combination but in the tests we have done it is working very well.
Upvotes: 2