Reputation: 12043
Consider a database that maintains a list of persons and their contact information, including addresses and such.
Sometimes, the contact information changes. Instead of simply updating the single person record to the new values, I like to keep a history of the changes.
I like to keep the history in a way that when I look at a person's record, I can quickly determine that there are older recordings of that person's data as well. However, I also like to avoid having to build very complicated SQL queries for retrieving only the latest version of each person's records (while this may be easy with a single table, it quickly gets difficult once the table is connected to other tables).
I've come up with a few ways, which I'll add below as answers, but I wonder if there are better ways (While I'm a seasoned code writer, I'm rather new to DB design, so I lack the experience and already ran into a few dead ends).
Which DB? I am currently using sqlite but plan to move to a server based DB engine eventually, probably Postgres. However, I meant this question asked in a more general form, not specific to any particular engine, though suggestions how to solve this in certain engines are appreciated, too, in the general interest.
Upvotes: 6
Views: 7674
Reputation: 61969
Must you keep structured history information?
Quite often, the history of changes does not have to be structured, because the history is needed for auditing purposes only, and there is no actual need to be able to perform queries against the historical data.
If you must keep structured history information:
If you need to able to execute queries against historical data, then you must keep the historical data in the database. Some people recommend separate historical tables; I consider this misguided. Instead, I recommend using views.
Upvotes: 10
Reputation: 807
We use a history integer column. New rows are always inserted with a history of 0, and any previous rows for that entry have the history incremented by 1.
Depending on how often the historical data is to be used, it might be wise to store history rows in a separate table. A simple view could be used if the combined data is desired, and it should speed things up if you usually just need the current rows.
Upvotes: 0
Reputation: 13701
We use what we call Verity-Block pattern.
The verity contains the periodicity, the block contains immutable data.
In the case of personal data we have the Identity
verity that has a validity period, and the IdentificationBlock
that contains the data such as Name
, LastName
, BirthDate
Block are immutable, so whenever we change something the application makes sure to create a new block.
So in case your last name changes on 01/01/2015 from Smits to Johnson then we have a verity Identity valid from [mindate] to 31/12/2014 that is linked to an IdentificationBlock where Lastname = Smits and an Identity that is valid from 01/01/2014 to [maxdate] linked to an IdentificationBlock where LastName = Johnson.
So in the database we have tables:
Identification
ID_Identification [PK]
Identity
ID_Identity [PK]
ID_Identification [FK]
ID_IdentificationBlock [FK]
ValidFrom
ValidTo
IdentificationBlock
ID_IdentificationBlock [PK]
ID_Identification [FK]
FirstName
LastName
BirthDate
A typical query to get the current name would be
Select idb.Name, idb.LastName from IdentificationBlock idb
join Identity i on idb.ID_Identification = i.ID_Identification
where getDate() between i.ValidFrom and i.ValidTo
Upvotes: 1
Reputation: 115691
This is generally referred to as Slowly Changing Dimension and linked Wikipedia page offers several approaches to make this thing work.
Martin Fowler has a list of Temporal Patterns that are not exactly DB-specific, but offer a good starting point.
And finally, Microsoft SQL Server offers Change Data Capture and Change Tracking.
Upvotes: 11
Reputation: 12043
Move older versions into a separate "history" table.
By using SQL triggers the old data is automatically written to the "history" table.
Pros:
Queries that ask for only the latest data remain simple.
By using triggers, updating data doesn't need to be concerned with maintaining the history.
Maintainig a FTS (Full Text Search) table in sqlite only for the most recent versions of data is easy because the triggers would be attached only to the "current" (non-history) table, thereby avoiding storing of obsolete data.
Cons:
Detection of history entries requires parsing a separate table (that's not a big issue, though). This may also be alleviated by adding a backlink column as a foreign key to the history table.
Every table that shall maintain a history needs a duplicate table for the history. Makes writing the schema tedious unless program code is written to create such "history" tables dynamically.
Upvotes: 0
Reputation: 12043
Add an "active" flag or add a "version" number.
Using a flag requires adding a condition such as active=1
to every query's WHERE clause involving the table.
Using a version number requires adding a subquery such as:
version = (SELECT MAX(version) FROM MyTable t2 WHERE MyTable.id = t2.id)
Pros:
Keeps the database design simple.
Detection of history entries is easy - just remove the extra condition from the queries.
Cons:
Updating data requires setting the active or version values accordingly. (Though this might be handled with SQL triggers, I guess.)
Complicates queries. While this may not affect the performance, it's getting more difficult to write and maintain such queries by hand the more complex the queries get, especially when involving joined queries.
Foreign keys into this table cannot use the rowid to refer to a person because updates to the person create a new entry in the table, thereby effectively changing the rowid of the latest data for the person.
Maintainig a FTS (Full Text Search) table in sqlite only for the most recent versions of data is slightly more difficult due to the triggers for automatic updates to the FTS need to take the active or version values into account in order to make sure that only the latest data is stored, while outdated data gets removed.
Upvotes: 0