iefpw
iefpw

Reputation: 7042

Disk based document based storage

Is there a highly scalable disk based NoSQL storage system available free on the web? The nice thing about SQL Server is that it scales, but it is a nightmare to migrate my project into SQL tables, since it is all objects?

The options are: 1. Run from memory 2. Serialize the document 3. Convert to to SQL 4. Use large NoSQL data storage

Upvotes: 0

Views: 398

Answers (3)

atlaste
atlaste

Reputation: 31106

In the past weeks I've been dealing with the same question; here's my observations:

  1. SQL Server works, but doesn't scale well. We've tested a SQL Server database of roughly 600GB of documents and let's just say things get very slow.
  2. More or less the same holds for MySQL... both are not really made for documents...
  3. Hadoop/HDFS doesn't appear to be mature on Windows. While Microsoft has a HDFS implementation available, it's still in the RC phase. On the positive side, if you're developing for the future (let's say 1 year to production), this seems to be a good choice.
  4. Apache Cassandra seems mature. On the positive, the implementation is very simple; that is: it's basically just a plain distributed key-value store with one partitioner, where both the key and value are a byte[]. However, the simpleness of the implementation also means you need to work around all kinds of issues. If you've worked with it, you know that it's brilliant if you need to implement Twitter, but too simple for just about anything else. It scales well, but to be honest I'm not too impressed with the performance. Further, I've encountered a couple of data inconsistencies/corruptions, which doesn't really warm my heart... If you use Cassandra, I would personally use Aquiles as client (because you will run into low-level stuff quite easily) - but FluentCassandra is a fine client too.
  5. MongoDB is quite mature as well. On the positive side, it's active and has a very good and easy to use (unlike Cassandra) C# client library. Further, although the shard server has crashed a couple of times on my cluster, recovery always did the trick (and I'm not too polite with restarts :-) and all the issues I encountered seem to be already solved in the development branch - so I'm not feeling uncomfortable about this. The most important thing that MongoDB has and Cassandra lacks is support for secondary indexes.

All these solutions are disk based (e.g. persistence on disk).

I looked at the code of 3-5 and implemented my own NoSQL solution in the past (about 6 years ago) that we've been using for data storage for the past years. To be honest, MongoDB is how I would have implemented it myself.

For completeness: the only thing that I haven't tried yet is CouchDB... but frankly I'm so happy with MongoDB that I won't even bother.

-Stefan.

Upvotes: 1

Derick
Derick

Reputation: 36774

MongoDB is disk based, but of course it will benefit from (lots of) memory. It's Open Source and free and it scales from one machine to thousands using sharding and replication.

You can download it and run it locally, or you can use one of the free hosted solutions.

Upvotes: 3

Davin Tryon
Davin Tryon

Reputation: 67296

There are a lot of NoSQL options that are offered on an open source license (GPL or Apache). While searching I came across this listing which goes a fair way giving a feature comparison of some of the options.

If you are tied into having to have a supported C# client, you will be a bit more limited, but I would look into MongoDB and Redis because I've used them in the past with good results.

Upvotes: 0

Related Questions