Reputation: 7735
I have a MySQL table, people
, that looks like this:
id | object_id | name | sex | published
----------------------------------------------
1 | 1 | fred | male | [timestamp]
2 | 2 | john | male | [timestamp]
The reason I have two id
s is that in my CRUD app the user might edit an existing object, in which case it becomes a draft, so that I have two rows (the draft record and the already-existing record) with the same object_id
, something like this:
id | object_id | name | sex | published
----------------------------------------------
2 | 2 | john | male | [timestamp]
3 | 2 | john | female | NULL
This allows me to keep track of records' drafts and publication status. When the row with id
of 3 is published, its published
field will be stamped and the already published row deleted.
Each person also has a job history, so I have a table history
:
id | person_object_id | job
----------------------------------
1 | 2 | dev
2 | 2 | accountant
This is John's job history. I refer to John's object_id
in the person_object_id
field, because if I refered to his id
I'd risk delinking the two tables if I deleted one of the John rows as in my example above.
So my question is: is it not inefficient to refer to a table, as I do above, using a non-primary key (object_id
instead of id
)? How can I refer to a primary key when I require a non-unique id to keep track of drafts/published rows?
Upvotes: 2
Views: 122
Reputation: 5636
It looks like you want to keep versions of your data and you've come across the age-old problem of how to maintain foreign key pointers to versioned data. The solution is actually easy and it turns out that it is a special case of second normal form.
Take the following employee data:
EmpNo FirstName LastName Birthdate HireDate Payrate DeptNo
Now you are tasked with maintaining versions of the data as it changes. You could then add a date field which shows when the data changed:
EmpNo EffDate FirstName LastName Birthdate HireDate Payrate DeptNo
The Effective Date field shows the date each particular row took effect.
But the problem is that EmpNo, which was a perfect primary key for the table, can no longer serve that purpose. Now there can be many entries for each employee and, unless we want to assign a new employee number every time an employee's data is updated, we have to find another key field or fields.
One obvious solution is to make the combination of EmpNo and the new EffDate field be the primary key.
Ok, that solves the PK problem, but now what about any foreign keys in other tables that refer to specific employees? Can we add the EffDate field to those tables, also?
Well, sure, we can. But that means that the foreign keys, instead of referring to one specific employee, are now referring to one specific version of one specific employee. Not, as they say, nominal.
Many schemes have been implemented to solve this problem (see the Wikipedia entry for "Slowly Changing Dimension" for a list of a few of the more popular).
Here's a simple solution that allows you to version your data and leave foreign key references alone.
First, we realize that not all data is ever going to change and so will never be updated. In our example tuple, this static data is EmpNo, FirstName, Birthdate, HireDate. The data that is liable to change then, is LastName, Payrate, DeptNo.
But this means that the static data, like FirstName is dependent on EmpNo -- the original PK. Changeable or dynamic data, like LastName (which can change due to marriage or adoption) is dependent on EmpNo and EffDate. Our tuple is no longer in second normal form!
So we normalize. We know how to do this, right? With our eyes closed. The point is, when we are finished, we have a main entity table with one and only one row for each entity definition. All the foreign keys can refer to this table to the one specific employee -- the same as when we've normalized for any other reason. But now we also have a version table with all the data that is liable to change from time to time.
Now we have two tuples (at least two -- there could have been other normalization processes performed) to represent our employee entity.
EmpNo(PK) FirstName Birthdate HireDate
===== ========= ========== ==========
1001 Fred 1990-01-01 2010-01-01
EmpNo(PK) EffDate(PK) LastName Payrate DeptNo
===== ======== ======== ======= ======
1001 2010-01-01 Smith 15.00 Shipping
1001 2010-07-01 Smith 16.00 IT
The query to reconstruct the original tuple with all the versioned data is simple:
select e.EmpNo, e.FirstName, v.LastName, e.Birthdate, e.Hiredate, v.Payrate, v.DeptNo
from Employees e
join Emp_Versions v
on v.EmpNo = e.EmpNo;
The query to reconstruct the original tuple with only the most current data is not terribly complicated:
select e.EmpNo, e.FirstName, v.LastName, e.Birthdate, e.Hiredate, v.Payrate, v.DeptNo
from Employees e
join Emp_Versions v
on v.EmpNo = e.EmpNo
and v.EffDate =(
select Max( EffDate )
from Emp_Versions
where EmpNo = v.EmpNo );
Don't let the subquery scare you. A careful examination shows that it locates the desired version row with an index seek instead of the scan that most other methods will generate. Try it -- it's fast (though, of course, mileage may vary across different DBMSs).
But here's where it gets really good. Suppose you wanted to see what the data looked like on a particular date. What would that query look like? Just take the query above and make a small addition:
select e.EmpNo, e.FirstName, v.LastName, e.Birthdate, e.Hiredate, v.Payrate, v.DeptNo
from Employees e
join Emp_Versions v
on v.EmpNo = e.EmpNo
and v.EffDate =(
select Max( EffDate )
from Emp_Versions
where EmpNo = v.EmpNo
and EffDate <= :DateOfInterest ); --> Just this difference
That last line makes it possible to "go back in time" to see what the data looked like at any specific time in the past. And, if DateOfInterest is the current system time, it returns the current data. This means that the query to see current data and the query to see past data are, in fact, the same query.
Upvotes: 1
Reputation: 181
It doesn't really matter as long as you have an index on that column (not-unique index). Than it would be almost as fast
Upvotes: 0