Reputation: 2665

DDL to define denormalization rules?

I've been doing the relational database thing for years now, but lately have moved into Cassandra/Redis territory. NoSQL makes sense for what we're doing, so that's fine.

As I was working through defining Cassandra column families today a question occurred to me: In relational databases, why doesn't DDL let us define denormalization rules in such a way that the database engine itself could manage the resulting consistency issues natively. In other words, when a relational database programmer denormalizes to achieve performance goals... why is he/she then left to maintain consistency via purpose-written SQL?

Maybe there's something obvious that I'm missing? Is there some reason why such a suggestion is silly, because it seems to me like having this capability might be awfully useful.

EDIT:

Appreciate the feedback so far. I still feel like I have an unanswered (perhaps because it's been poorly articulated) question on my hands. I understand that materialized views attempt to offer engine-managed consistency for denormalized data. However, my understanding is that they aren't updated immediately with changes to the underlying tables. If this is true, it means the engine really isn't managing the consistency issues resulting from the denormalization... at least not at write-time. What I'm getting at is that a normalized data structure without true, feature-rich, engine-managed denormalization hamstrings relational database engines when it comes time to scale a system with heavy read load against complex relational models. I suppose it's true that adjusting materialized view refresh rates equates to tunable "eventual consistency" offered by NoSQL engines like Cassandra. I need to read up on how efficiently engines are able to sync their materialized views. In order to be considered viable relative to NoSQL options, the time it takes to sync a view would need to increase linearly with the number of added/updated rows.

Anyway, I'll think about this some more and re-edit. Hopefully with some representative examples of imagined DDL.

Upvotes: 2

Answers (2)

gbn

Reputation: 432200

Denormalisation in an RDBMS is a special case: not the standard. One only does this when you have a proven case. If you design in denormalised data up front, you've already lost.

Given each case is by definition "special", then how can there be standard SQL constructs to maintain the denormalised data.

An RDBMS differs from NoSQL in that it is designed to work with normalised designs. IMHO, you can't compare RDBMS and NoSQL like this

Upvotes: 1

Quassnoi

Reputation: 425261

Some relational database systems are able to maintain consistency of denormalized data to some extent (if I understand right what you mean).

In Oracle, this is called materialized views, in SQL Server — indexed views.

Basically, this means that you can create a self-maintaned denormalized table as a result of an SQL query and index it:

CREATE VIEW a_b
WITH SCHEMABINDING
AS
SELECT  b.id AS id, b.value, b.a_id, a.property
FROM    dbo.b b
JOIN    dbo.a a
ON      a.id = b.a_id

The resulting view, a_b, were it a real table, would violate 2NF since property is functionally dependent on a_id which is not a candidate key. However, the database system maintains this functional dependency and you can create a composite index on, say, (value, property).

Even MySQL and PostgreSQL which don't support materialized views natively are capable of maintaining some kind of denormalized tables.

For instance, when you create a FULLTEXT index on a column or a set of columns in MySQL, you get two indexes at once: first one contains one entry for each distinct word in each record (with a reference to the original record id), the second one contains one record per each word in the whole table, with the total word count. This allows searching for the words fast and ordering by relevance.

The total word count table is of course dependent on the individual words table and hence violates 5NF, but, again, the systems maintains this dependency.

Similar things are done for GIN and GIST indexes in PostgreSQL.

Of course not all possible denormalizations can be maintained, that means that you cannot materialize and index just any query in real time: some are too expensive to maintain, some are theoretically possible but not implemented in actual systems, etc.

However, you may maintain them using your own logic in triggers, stored procedures or whatever, that's exactly what they are there for.

Upvotes: 2

DDL to define denormalization rules?

Answers (2)

Related Questions