Reputation: 7415
Several colleagues and I are faced with an architectural decision that has serious performance implications: our product includes a UI-driven schema builder that lets non-programmers build their own data types for a web app. Currently, it builds properly normalized schemas behind the scenes and includes some complex logic to alter the schema and migrate legacy data automagically if the admins make changes to the data types.
The normalized schemas have hit performance bottlenecks in the past, and a major refactoring has been scheduled. One of the groups of developers wants to store every property of the data types in a separate table, so that changes to the data types will never require schema altering. (A single property could be turned into a 1:n relationship, for example, just by changing application logic.)
Because early benchmarking indicates this will exact a huge performance penalty, they have built a caching layer in application code that maintains denormalized versions of each data type. While it does speed up the queries, I'm skeptical about the complexity that the application layer will be taking on, but I'm hoping for feedback - am I being pessimistic? Have others deployed this type of solution successfully? Should I stick to my guns, or is moving the complexity from "Schema modification tools" to "Schema mirroring tools" a good thing?
Upvotes: 3
Views: 3163
Reputation: 18950
Strictly speaking, there is no such thing as "hyper normalization". If a fully normalized schema is refactored into another equivalent fully normalized schema, they are both equally normalized. What is often called "hyper normalization" is actually table decomposition in pursuit of goals other than normalization.
If I read you right, I don't think that's what's going on in the situation you outlined in your original post. It seems as though your group is debating between a normalized schema and a somewhat denormalized schema.
When this is done manually by an intelligent designer, there are trade offs between the performance advantages obtained with certain kinds of denormalized designs, and the update problems that follow, willy-nilly, from denormalization. I expect that your product that does the same thing automatically suffers from the same difficulties that intelligent human schema designers suffer from, and then some.
I'm guessing that the real problem at your site is not really a performance problem at all. Rather, I think it's the conflict between the requirement to let users define new data types at will and the desire to obtain the same results one gets from a well designed database that's derived from a static set of information requirements.
These two goals are, in my opinion, irreconcilable. The ultimate layer of data management in your database is being done by your community of users. And they probably don't even meet in committee to decide which new definitions will become standard and which ones will be discarded. Unless I miss my guess, the schema is probably full of synonymous data types, where different users invented the same data type, but gave it a different name. It probably also contains some homonymous data types, where different users invented different data types, but gave them the same name.
Every time this has been attempted, it has resulted in chaos. The first visible sign of that chaos is an intractable performance problem, but that's just the tip of the iceberg. Sooner or later, you are going to discover that what you have in your database is unmanaged data. And you are going to discover that it's impossible to deliver the same bang for the buck that one obtains with managed data.
If I'm wrong about this, I'd love to learn about the example that proves me wrong. It would represent a fundamental advance in the state of the art.
Upvotes: 2
Reputation: 86492
Our gracious SO team have tackled this subject: Maybe Normalizing Isn't Normal.
Conclusion -
As the old adage goes, normalize until it hurts, denormalize until it works.
Upvotes: 0
Reputation: 37655
What you describe doesn't resemble what I call normalization. It's more like hyperabstraction - trying to find some abstraction level from which everything else can be derived. Like "Object" in javascript. If you take it to its logical conclusion, you could get by with two tables; one table for every Object, with a column for a ObjectTypeCode and ObjectId; and another table with Associations, having two ObjectId columns, a third for uniqueness, and a fourth for Value.
I suggest you need to revisit your domain model. The one you describe sounds scary (but unfortunately, eerily familiar) to me. I had a guy who worked for me who invented a table called "Objects". There were two child tables, ObjectAttributes and ObjectProperties. It was difficult to get a clear explanation of the difference between the two. The table (and he, fortunately) didn't last long.
Upvotes: 5
Reputation: 24311
The normalized schemas have hit performance bottlenecks in the past, and a major refactoring has been scheduled. One of the groups of developers wants to store every property of the data types in a separate table, so that changes to the data types will never require schema altering. (A single property could be turned into a 1:n relationship, for example, just by changing application logic.)
This sounds like a bad idea to me.
Upvotes: 11