Kaushik Lele
Kaushik Lele

Reputation: 6637

How can denormalization be attribute of NoSQL DB

While discussing NoSQL DBs against traditional RDBMS, many articles say that, in NoSQL-DB all related data is kept together so joins are avoided. Thus retrieving data is faster. In short data is denormalized. There are downsides of denormalization as well. e.g. redundancy, extra space, need to update data at multiple places etc.

But irrespective of Pros-and-cons of denormalization; it is a DB design attribute. How can it be attributed to particular DB-type ? If in a given case, it is ok to denormalize data then same can be achieved in RDBMs also.

So why is denormalization discussed as attribute of NoSQL db ?

Upvotes: 6

Views: 1235

Answers (2)

Zim-Zam O'Pootertoot
Zim-Zam O'Pootertoot

Reputation: 18148

Seconding John Saunders that you can denormalize data in an RDBMS as well - denormalization is an attribute of most NoSQL databases ("most" meaning "excluding graph databases") because in many cases you MUST denormalize in order to get decent performance.

Continuing with his example, let's say that I've got a Person record, which has a foreign key to a Car record (one car per person in this example to simplify matters), which has a foreign key to a Manufacturer record. For a given person I want the record for that person, for their car, and for their car's manufacturer.

In an RDBMS I can normalize this data and retrieve it all in one query using a join, or I can denormalize this data - the denormalized read is going to be a bit cheaper than the normalized read because joins aren't free, but in this case the difference in read performance probably won't be significant.

My NoSQL database probably doesn't support joins so if I normalize this data then I'll have to do three separate lookups for it, e.g. using a key-value database I'd first retrieve the Person which contains a Car key, then I'd retrieve the Car which contains a Manufacturer key, then I'd retrieve the Manufacturer; if this data were denormalized then I'd only need one lookup, so the performance improvement would be significant. In the rare case that the NoSQL database does support joins then it is almost certainly location agnostic, so the Person, Car, and Manufacturer records might be on different servers or even in different data centers making for a very expensive join.

So an overly simplistic breakdown of your options are:

  1. Traditional RDBMS, good with normalized data but difficult to scale out
  2. NoSQL database, relatively easy to scale out but a bit crap with normalized data
  3. Distributed OLAP database (e.g. Aster, Greenplum), relatively easy to scale out and good with normalized data but very expensive

Upvotes: 5

John Saunders
John Saunders

Reputation: 161773

You seem to be reading hype, instead of database design articles. You can denormalize any database. Yes, NoSQL is for cases where denormalized data is a good thing, for instance, in storing documents, where subdocuments are used instead of joins to another table. This works best when the subdocuments are not duplicated. Of course, if they are duplicated, then you have the usual problems of denormalized data.

Example: Person uses Car. In a relational database, you would have a Persons table and a Cars table and a junction table, perhaps "CarsUsedByPerson". In a NoSQL system, you might have a "car" document embedded within a "person" document.

Of course, if two people use the same car, then you have the same data in multiple places, and you'll need to update it in all such places, or it will be inconsistent.

NoSQL is for cases where you need the performance more than you need the consistency.

Upvotes: 5

Related Questions