Reputation:
As in many databases, i am designing a database that should keep record of previous versions of the rows changed in each table.
The standard solution to this problem is to keep a history table for each data table, and whenever a row needs to be updated in the data table, a copy of the current row gets inserted to the history table and than the row in the data table gets updated.
the disadvantages of this solution for me:
I am considering a different solution and would like to know if it's ok. for each table, we add the column IS_LAST
assume that in my case, rows are updated at an average of 10 times. also assume that at least 90% of the actions performed by the application happens only on the recent version of the rows.
my database is an Oracle 10g so to keep the "active" table slim, we can split the table to 2 partitions: the IS_LAST=1 partition, and the IS_LAST=0 partition.
Is partitioning a good way of solving the history data keeping problem?
Does this solution limits other partition potential to these tables?
thanks!
Upvotes: 9
Views: 4923
Reputation: 6860
As with others, I use an ORM (Propel) with a Base Object containing custom save & delete methods. These methods override the standard save & delete that come with the ORM. They check to see which columns have changed, and create 1 row in the change table for each changed column.
Schema for change
table:
change_pk, user_fk, user_name, session_id, ip_address, method, table_name, row_fk, field_name, field_value, most_recent, date_time
Example: 1, 4232, 'Gnarls Barkley', 'f2ff3f8822ff23', '234.432.324.694', 'UPDATE', 'User', 4232, 'first_name', 'Gnarles', 'Y', '2009-08-20 10:10:10';
Upvotes: 0
Reputation: 3135
It all depends on what you have:
Upvotes: 0
Reputation: 13572
Since you are using Oracle, you could check Oracle Flashback Technology. It records changes of all changes in database, both data and structure. It also records time stamp and user name.
I didn't use it, but it looks capable.
Upvotes: 1
Reputation: 99
Would tracking it based on time help you achieve the effect you are looking for on a daily basis and at the end of the business or at mid night depending on the lowest transaction volume time if you executed a procedure to move trailing data into history table then would it help ?? that way all your updates would be inserts and no locking is required as well. Regards, Andy
Upvotes: 0
Reputation:
If I have 1 or 2 tables of history to keep I would do it exactly as Tuinstoel has suggested. But if you had dozens of tables to do this on I would move more toward a solution described by zendar. The reason is this.
How do you answer questions like,
What changed since yesterday when everything was fine?
Has user SMITHG made any changes?
Those questions require a one query per table, whether it's a separate _hist table or a partition inside the table. No matter, it's some huge list of queries. If you have a central table that looks like this, then it's a piece of pie.
table_name, Column_name, PK, Before_value, After_value, User, timestamp
Inserts have only after values,
Deletes have only before values,
Update have both but only for the columns which changed.
Some variations
You can include a column for I/U/D if you prefer You can exclude column values for Inserts and just record the PK and I since the correct values are still in the table.
Since this is Oracle you could partition on table_name, so in essence you actually do have one hist "table" per real table.
You can easily answer the above questions, which I believe are, quite simply, the most often asked questions. And it handles every question you can answer with partitions or _hist tables.
Upvotes: 1
Reputation: 13572
First question should be: what would you do with that data? If you don't have clear business requirement, don't do it.
I did something similar and after 3 years of running there is about 20% of "valid data" and rest is "previous versions". And it is 10 million + 40 million records. In last three years we had 2 (two) requests to investigate history of changes and both times requests were silly - we record time stamp of record change and we were asked to check if persons worked overtime (after 5pm).
Now, we are stuck with oversized database that contains 80% of data that nobody needs.
EDIT:
Since you asked for possible solutions, I'll describe what we did. It's a bit different than solution you are considering.
We record all changes in single archive table with following columns:
Thing works this way:
Pro:
Cons:
LIKE
operator on string.So, again, check the requirements on archive. It is not trivial task, but gains and use can be minimal.
Upvotes: 6
Reputation: 7316
I would use the IS_LAST=1
partition, and the IS_LAST=0
partition system. Because it is partitioned it will be fast (partition pruning) and you will never have to query a union of the normal table and the history table.
I would use IS_LAST='Y'/'N' and not 1/0. 1 and 0 are meaningless.
There is a special trick that can help guarrantee that there is max one row with IS_LAST='Y'
per entity: You can create a unique function based index with a function that returns null when IS_LAST='N'
and return the id when IS_LAST='Y'
. It is explained here: http://www.akadia.com/services/ora_function_based_index_1.html
Upvotes: 0
Reputation: 51927
How will you define primary keys? There will be many rows with the same primary key due to keaping the history rows in the same table.
Also you don't seem to have a way to know the order of your history rows when a single "real" row gets changed more the once.
(One project I worked on, we generated all the history tables and triggers using codesmith, this worked very well.)
Upvotes: 0
Reputation: 3732
The main limitation that comes to my mind is that a substantial portion of your table will be history data, which means indexing concerns and potentially introducing additional complexity into your CRUD queries.
Is there some particular reason you don't want to use what seems to be the usual solution to this situation?
Upvotes: 0
Reputation: 18933
I'd create two tables: one for IsLast kind of values and one for historical ones. Then I'd setup a trigger that inserts value into the historical table every time the isLast is updated.
Upvotes: 2