Reputation: 91
This is a theoretical question which I ask due to a request that has come my way recently. I own the support of a master operational data store which maintains a set of data tables (with the master data), along with a set of lookup tables (which contain a list of reference codes along with their descriptions). There has been recently a push from the downstream applications to unite the two structures (data and lookup values) logically in the presentation layer so that it is easier for them to find out if there have been updates in the overall data. While the request is understandable, my first thought is that it should be implemented at the interface level rather than at the source. Combining the two tables logically (last_update_date) at ODS level is almost similar to the de-normalization of data and seems contrary to the idea of keeping lookups and data separate. That said, I cannot think of any reason of why it should not be done at ODS level apart from the fact that it does not "seem" to be right... Does anyone have any thoughts around why such an approach should or should not be followed?
I am listing an example here for simplicity's sake.
Data table
ID Name Emp_typ_cd Last_update_date
1 X E1 2014-08-01
2 Y E2 2014-08-01
Code table
Emp_typ_cd Emp_typ_desc Last_Update_date
E1 Employee_1 2014-08-23
E2 Employee_2 2013-09-01
The downstream request is to represent the data as
Data view
ID Name Emp_typ_cd Last_update_date
1 X E1 2014-08-23
2 Y E2 2014-08-01
or
Data view
ID Name Emp_typ_cd Emp_typ_desc Last_update_date
1 X E1 Employee_1 2014-08-23
2 Y E2 Employee_2 2014-08-01
Upvotes: 2
Views: 2707
Reputation: 1970
You are correct, it is demoralizing the database because someone wants to see the data in a certain way. The side effects, as you know, are that you are duplicating data, reducing flexibility, increasing table size, storing dissimilar objects together, etc. You are also correct that their problem should be solved somewhere or somehow else. They won’t get what they want if they change the database the way they want to change it. If they want to make it “easier for them to find out if there have been updates in the overall data” but they duplicate massive amounts of it, they’re just opening themselves up to errors. In your example the Emp_typ_cd Updated value must be updated for all employees with that emp type code. A good update statement will do that, but still, instead of updating a single row in the lookup table you’re updating every single employee that has the emp type.
We use lookup tables all the time. We can add a new value to a lookup table, add employees to the database with a fk to that new attribute, and any report that joins on that table now has the ID, Value, Sort Order, etc. Let’s say we add ‘Veteran’ to the lu_Work_Experience. We add an employee with the veteran fk_Id and now any existing query that joins on lu_Work_Experience has that value. They sort Work Experience alphabetically or by our pre-defined sort.
There is a valid reason for flattening your data structure though, and that is speed. If you’re running a very large report it will be faster with now joins (and good indexing). If the business knows it’s going to run a very large report many times and is worried about end user wait times, then it is a good idea to build a single table for that one report. We do it all the time for calculated measures. If we know that a certain analytic report will have a ton of aggregation and joins we pre-aggregate the data into the data store. That being said, we don’t do that very often in SQL because we use cubes for analytics.
So why use lookup tables in a database? Logical separation of data. An employee has a employee code, but it does NOT have a date of when an employee code was updated. Reduce duplicate data. Minimize design complexity. To avoid building a table for a specific report and then having to build a different table for a different report even if it has similar data.
Anyway, the rest of my argument would be comprised of facts from the Database Normalization wikipedia page so I’ll skip it.
Upvotes: 1