Reputation: 1178

Architecture - Selecting a NoSQL for .NET application

This question is about selecting the "right" type of NoSQL database, and I hope maybe even discuss specific ones and why they fit, according to some requirements/use-cases that I will list below together with the traditional RDBMS solution that is currently in place. It is a little long but I think that any discussion on this topic can be really beneficial to people that are trying to learn the new paradigm(s). There are many discussion about NoSQL but from what I have seen - most of them are high level and don't give enough insight for newbies.

So, here it comes:

I've been developing against traditional RDBMS/SQL systems for most of my programming career (15 years) and have good experience with it. Lately, there is a big buzz about NoSQL and how useful it is - so I am interested to understand how it can be beneficial. The system I describe is a bit more complex than the average TODO or Calender example that I've seen and thus, can make good discussion.

The system is related to cellular networks which are relatively complex - there are about 300 "classes" in such network (and a "full deployment" can have several network together and can grow up to 1000 and more classes) with varying number of instances (100,000 - 10s) for each. This are loaded each days (sometimes a few times a day) to a database in order to drive the system. The relations between the classes are either containment or "usage". The domain is changing relatively fast (about 3 month between software updates of the network, each usually means adding parameters to existing classes and adding a few (10-20) new classes).

The usage (use cases) of the system was as follow: 0. parse the data (into data-containers hierarchy) and load it to a relational database (usually from XML files of around 2GB)

viewing properties (like "select field1, field2 from table1 where ids in ()" and viewing in a table format
tracking changes (what change between today and yesterday - parameters who's value has changed and added/removed instances
checking business rules:
- it can be simple (SELECT idField1...idFieldN, paramValue FROM table where paramValue<>default"
- or more complex - checking on relations - e.g. number of children of type x etc
retrieving all hierarchy of class - select specific class instance(s), its children and sometimes, classes that are used by the instance or it children
make changes to the class instances and push back into the network (then see it was indeed executed - validation of the changes). This usually required to generate some XML file based on the hierarchy of the classes.

In the RDBMS solution, to overcome these requirements I mapped the data into relational tables (a class for each) and then held a metadata and relations dictionary. In addition, for data retrieval tasks created a general data container (class type name + key-value (or values)) or use DataTables that could be merged into views or files.

This architecture (platform) meant that on an upgrade all I had to do is update/create the tables (alter/create table) and update the metadata and relations - the rest of the code was "generic" and driven by the metadata. The only exception was for (4) above which sometimes required me to hard code (add children to data retrieval hierarchy) though eventually I generalized this processes as well (hierarchical data retrieval-get child element based on id of parent and so on down the hierarchy).

The system works well in most cases but sometimes was too slow (especially in 4). The slowness was related to the retrieval of data from the DB but only in some deployment and it maybe related to poor maintenance or insufficient hardware (or bad programming but then, why it works well in other deployments?-)

I will add that since the domain is a network, each instance has a Distinct Name - usually consisting of it's hierarchy (the instance and it's parent e.g. "Node=ER222,Subrack=3,Slot=5" or "Node=ER222,Equipment=1,Sector=2,Carrier=C2") and the hierarchy of each class is generally the same (though some classes can appear in several hierarchy (e.g. have different ancestors)

Usually there is not much load on the system - maybe up to as much as 50 active users but usually much less. In a larger network this can maybe grow up to 300-400 users.

Now I want to develop a system with similar requirement and am considering what advantages NoSQL may give:

I read that for dynamic-schema or schema-less NoSQL is a natural choisce.
I read the graph databases are good for modeling "network" (or network like) so maybe that could be a solution (node=class, edge = containment or usage (having attribute on the edge)).
Maybe use some document database and keep the XML only partially parsed and access it by the hierarchy?
- How do I go about selecting specific fields from specific classes - do I have to generate gruesome XPath queries for that?
Maybe an object database?
- but then - do I have to keep a (bloated) model of 1000 or more POCOs? How easy it will be to serialize/deserialize?

In addition to the above, I am developing with .NET technologies so if anyone has specific ideas - better ones that fit into this ecosystem or at least can be developed with .NET (e.g. REST/THRIFT interface and matching .NET API)

If you read that far - I appreciate it a lot and if you care to join in- even more so ;-)

Upvotes: 3

Answers (2)

Arnon Rotem-Gal-Oz

Reputation: 25909

As Chris said you should keep in mind that a lot of things you take for granted in the RBMS world are often missing in NoSQL databases. Another thing you should keep in mind is that NoSQL is a very wide term covering a lot of technologies so in that sense you question lacks focus.

You develop in .NET so NoSQL databases with good integration are not abound. A document database you can consider is RavenDB. It is written in .NET (you can write indexes and queries as Linq), it is transactional (as far as updating data - though index are eventually consistent) and it is document orients (i.e. schema-less).

You can see how you can handle relations in RaveDB here but note that if most of your queries are graph traversal you may need a graph database instead

Upvotes: 1

Chris Travers

Reputation: 26454

Ok, so this is just my humble opinion here, but generally, RDBMS's are tools that have capabilities that people take for granted right up until they move off them and then hate the NoSQL product they switched to because they never should have switched in the first place. In general, it is always a mistake to switch based on hype. Also keep in mind that NoSQL db's are generally quite limited and specialized compared to RDBMS's and therefore you tend to be giving up on more than you are getting. Sorry, that's the way it is. Finally, relational database management systems tend to be so good at optimizing things that intermittant performance issues can be very difficult to track down, but at least you aren't doing all the optimization yourself.

So having read all that you may think I am arguing you should rule out NoSQL, but I am not. What I am saying is you should be cautious about it. NoSQL db's generally are very well optimized for very small niches and therefore tend to do poorly on general purpose tasks. On the other hand, that optimization makes them useful sometimes.

Instead of replacing your relational db with a NoSQL db, the question may be whether you can use some of the NoSQL db's as a secondary engine for storing/caching/preprocessing and thus avoid some of the issues you currently have. In this view, NoSQL db's belong as adjuncts to traditional relational processing systems. I would be looking at both graph and document databases here, as preprocessing for the relational db.

Upvotes: 2

Architecture - Selecting a NoSQL for .NET application

Answers (2)

Related Questions