Reputation: 3814
I'm sure this would have been asked before, but I'm very confused!
Say I have a SQL Server db, which contains the following tables
and data...
INSERT [dbo].[Organisation] ([id], [name]) VALUES (1, N'ABC Ltd')
INSERT [dbo].[Organisation] ([id], [name]) VALUES (2, N'XYZ Ltd')
INSERT [dbo].[Employee] ([id], [name], [organisationId]) VALUES (1, N'Dave', 1)
INSERT [dbo].[Message] ([id], [text], [employeeId], [created]) VALUES (1, 'My 1st message', 1, '2012 01-01 00:00:00')
INSERT [dbo].[Message] ([id], [text], [employeeId], [created]) VALUES (2, 'My 2nd message', 1, '2012 01-02 00:00:00')
INSERT [dbo].[Message] ([id], [text], [employeeId], [created]) VALUES (3, 'My 3rd message', 1, '2012 01-03 00:00:00')
So, we can see that Dave, a man that works for ABC Ltd, has created 3 messages on 3 consecutive days. All is well in the world.
If it turns out that Dave has never worked for ABC Ltd, but in fact works for XYZ Ltd that's fine, we changes the organisation Id and that's that.
What, however, should I be doing if he did work for ABC, but then changed to XYZ Ltd on 2012-01-02.
Any report that ask for how many messages were raised by each organisation would, if run the day before we change Daves organisationId, show 100% for ABC, and 100% for XYZ if run the day after. Wrong, wrong, wrong!!
My question isn't for someone to solve this conundrum, but point me in the direction of subjects that I could be looking at that could help me.
I've been on searching the following terms today "Data warehousing", "Time based Systems", and "Temporal Databases" and have read some very confusing articles (confusing for me, I'm sure they are great articles).
So, can anyone out there help me out by nudging me in the right direction? I'm sure you can gather from this message that I need a "for dummies" guide to the subject.....whatever that subject is!!!
Cheers.
Upvotes: 2
Views: 215
Reputation: 52107
Here is a very straightforward way to model this situation:
Essentially, you are tying the message not to the employee (the person), but to the particular period of employment. This works fine assuming:
Message.Created
should fall inside the corresponding Employment
's StartDate
and TerminationDate
, yet the database itself will not enforce it (at least not declaratively).Upvotes: 0
Reputation: 13077
I've seen this same issue come up multiple times over the years. This is a failure to recognize that "This is where Dave works right now (or the last time we checked)" is a different relationship that "This is part of Dave's work history". The work history relationship is stateful because each association has a start date and nullable end date. The first time I saw this design pattern was with a health club membership system.
Clearly you don't want to query message data using the "where Dave works right now" relationship. I can think of 2 ways to solve the immediate problem: either associate the message directly with the company, or follow the work history in order to derive the company. In practice, I've seen this latter approach get overly complicated; if you decide to go that route, make sure you're getting something from it in terms of data you care about. Definitely you should consider taking the simple solution and just capture the direct message/company relationship that you know you care about. This would also take care of the case where Dave is moonlighting.
Upvotes: 0
Reputation: 96572
OK when you are going to need to report by time, then you need to store the data that way. So instead of think of these just as normalized tables, what you need to do is think of them as lookup tables and then store the values needed in the the message table.
This is NOT denormalization as the data changes over time. So for instance if I have a message that I will need to know which organziation sent it and which employee sent it, then I need to store both in the messages table and not reply on the connection between employee and organization (which I probably will need to know for other things.)
For some things you don't even want to store the id field in the final table but the actual text data. So you might want to store the employee name and organization name as well as the ids in the message table if you need to report on the name of the oprganization (or person) as it was at the time the message was sent. This seems less likely with messages than some other things, so let me give you an example, you have a order application. In the order details table, you don't want to store just the part_number, but you also want to store the name of the part (this might change over time, but when the customer has questions, he will be looking at the paperwork you sent him at the time) and the price (which will almost certainly change over time) and perhaps other details. You may also want to store the PK to the part so that you can look it up easily right now and see how much a replacement might cost for instance.
Upvotes: 0
Reputation: 51445
What, however, should I be doing if he did work for ABC, but then changed to XYZ Ltd on 2012-01-02.
You've just defined a many to many relationship. An employee can work for more than one organization, and an organization has more than one employee.
Start with this Wikipedia article on Data Normalization. Search Google Images for "many to many relationship". The images will lead you to some good explanations.
Upvotes: 1