Reputation: 7219
I have recently been told by one of my stakeholders to add a set of "generic" tables to my data warehouse. These tables will each be of the same granularity as one of my fact tables, and will contain a set of columns named after their data types (e.g., Int1, String1, Date1, Decimal1, Int2, String2, Date2, Decimal2 ...
).
The stakeholder intends to have a set of tables in one of the source systems that users can access and load with whatever they desire, then let the ETL insert it into the dynamic dimension and show it to them within SSAS cubes, reports, and whatever else they want.
I've seen this kind of extensible design a few times, usually under the hood of some sort of customizable application, but I am cringing at the idea of putting something like this in my nice neat warehouse.
Am I correct in thinking that this is a poor design decision? What problems/flaws/faults are there with this approach that I could run into down the road? Or, if I'm wrong, what advantages are there that I am not seeing?
Upvotes: 0
Views: 118
Reputation: 18408
Such tables are, in principle, a mis-application of the relational model of data itself. The relational model of data was devised around the core idea that (the rows in) tables represent (instantiations of) predicates. With a one-to-one correspondence between tables and predicates (this is needed for facilitating interpretation).
You let your user write up a predicate in which his int1, date1, ... are the free places and such that his predicate always gives the correct interpretation for "whatever he ends up putting into it". That is impossible to do, by definition of the stated purpose "want to put in it whatever pleases me".
Iow, such tables are the result of a refusal to do upfront data/information analysis. Without such analysis, however, it is impossible to know post factum what the data means / how it is to be interpreted. But knowing what the data means is inevitable if you want to manipulate the data, and that is why this often leads to "inner platform effect" : some machinery is needed to facilitate management of "what the data means". The RM was devised with the idea that the DBMS would BE that machinery, but a refusal to use the DBMS for its intended purposes leads to having to implement those purposes in some other way.
Upvotes: 0
Reputation: 18940
The phrase "whatever they desire" can mean a multitude of different things.
What it often means is that the data has never been subjected to a thorough analysis and has never been integrated into a unified whole. Different users may be talking about different entities and relationships, or different attributes, even if they all refer to the same underlying subject matter.
Sometimes, this is the best you can do, because there is no unified conceptual model for the entire body of data, or its structure evolves over time in erratic and unpredictable fashion. Other times, an analysis would have been possible, but was never done, just because it's easier to just let each user do their own thing.
The trouble usually comes when some of the stakeholders begin to want the kinds of outputs (reports and extracts) out of this mess that would be easily delivered from a unified database. At that point, the entire job of integrating and reconciling data that is mostly disjointed and haphazard lies between you and any meaningful result. It takes a long time, it costs a lot of money, and the results can't be predicted in advance.
But management thinks it should be easy because "after all, the data is in a database, isn't it"? You would think this mistaken expectation could be avoided. In practice it is often not avoided.
This isn't the only pitfall, but it's the big one.
Upvotes: 2