Edward
Edward

Reputation: 9778

Is there a MySQL option/feature to track history of changes to records?

I've been asked if I can keep track of the changes to the records in a MySQL database. So when a field has been changed, the old vs new is available and the date this took place. Is there a feature or common technique to do this?

If so, I was thinking of doing something like this. Create a table called changes. It would contain the same fields as the master table but prefixed with old and new, but only for those fields which were actually changed and a TIMESTAMP for it. It would be indexed with an ID. This way, a SELECT report could be run to show the history of each record. Is this a good method? Thanks!

Upvotes: 180

Views: 242274

Answers (9)

midenok
midenok

Reputation: 1215

MariaDB supports System Versioning since 10.3 which is the standard SQL feature that does exactly what you want: it stores history of table records and provides access to it via SELECT queries. MariaDB is an open-development fork of MySQL.

Example

1. Create a table storing company IDs & names
CREATE TABLE company(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR (255) CHARSET utf8
) WITH SYSTEM VERSIONING;

The above automatically adds ROW_START and ROW_END pseudo-columns of type TIMESTAMP(6). They are invisible on SELECT *, but can be accessed with SELECT *, ROW_START, ROW_END. More on that later.

2. Insert initial companies "Foo" and "Bar"
INSERT INTO company(name) VALUES("Foo"), ("Bar");
3. Rename company "Bar" to "Baz"
UPDATE company SET name = "Baz" WHERE id = 2;
4. Delete company "Foo"
DELETE FROM company WHERE id = 1;
5. Display current data (no surprises)
SELECT * FROM company;
+----+------+
| id | name |
+----+------+
|  2 | Baz  |
+----+------+
6. Check changes of a single day
SELECT *, ROW_START, ROW_END
FROM company
FOR SYSTEM_TIME FROM '2024-10-18' TO '2024-10-19';
+----+------+----------------------------+----------------------------+
| id | name | ROW_START                  | ROW_END                    |
+----+------+----------------------------+----------------------------+
|  1 | Foo  | 2024-10-18 14:41:59.908731 | 2024-10-18 14:49:50.044427 |
|  2 | Bar  | 2024-10-18 14:41:59.908731 | 2024-10-18 14:42:17.556945 |
|  2 | Baz  | 2024-10-18 14:42:17.556945 | 2038-01-19 04:14:07.999999 |
+----+------+----------------------------+----------------------------+

You can see that ROW_START represents entry date, while ROW_END represents the modification or deletion date. For the current data, ROW_END is represented by the maximum value of type TIMESTAMP(6) (2038-01-19 04:14:07.999999).

7. Check data at a specific time
SELECT *
FROM company
FOR SYSTEM_TIME AS OF TIMESTAMP '2024-10-18 14:45';
+----+------+
| id | name |
+----+------+
|  1 | Foo  |
|  2 | Baz  |
+----+------+

See how "Bar" is renamed to "Baz", while "Foo" is not yet deleted.

There are some exciting options (like partitioning history data in a separate table for performance). You can find more on System Versioning via this link:

https://mariadb.com/kb/en/library/system-versioned-tables/

Upvotes: 15

Neville Kuyt
Neville Kuyt

Reputation: 29629

It's subtle.

If the business requirement is "I want to audit the changes to the data - who did what and when?", you can usually use audit tables (as per the trigger example Keethanjan posted). I'm not a huge fan of triggers, but it has the great benefit of being relatively painless to implement - your existing code doesn't need to know about the triggers and audit stuff.

If the business requirement is "show me what the state of the data was on a given date in the past", it means that the aspect of change over time has entered your solution. Whilst you can, just about, reconstruct the state of the database just by looking at audit tables, it's hard and error prone, and for any complicated database logic, it becomes unwieldy. For instance, if the business wants to know "find the addresses of the letters we should have sent to customers who had outstanding, unpaid invoices on the first day of the month", you likely have to trawl half a dozen audit tables.

Instead, you can bake the concept of change over time into your schema design (this is the second option Keethanjan suggests). This is a change to your application, definitely at the business logic and persistence level, so it's not trivial.

For example, if you have a table like this:

CUSTOMER
---------
CUSTOMER_ID PK
CUSTOMER_NAME
CUSTOMER_ADDRESS

and you wanted to keep track over time, you would amend it as follows:

CUSTOMER
------------
CUSTOMER_ID            PK
CUSTOMER_VALID_FROM    PK
CUSTOMER_VALID_UNTIL   PK
CUSTOMER_STATUS
CUSTOMER_USER
CUSTOMER_NAME
CUSTOMER_ADDRESS

Every time you want to change a customer record, instead of updating the record, you set the VALID_UNTIL on the current record to NOW(), and insert a new record with a VALID_FROM (now) and a null VALID_UNTIL. You set the "CUSTOMER_USER" status to the login ID of the current user (if you need to keep that). If the customer needs to be deleted, you use the CUSTOMER_STATUS flag to indicate this - you may never delete records from this table.

That way, you can always find what the status of the customer table was for a given date - what was the address? Have they changed name? By joining to other tables with similar valid_from and valid_until dates, you can reconstruct the entire picture historically. To find the current status, you search for records with a null VALID_UNTIL date.

It's unwieldy (strictly speaking, you don't need the valid_from, but it makes the queries a little easier). It complicates your design and your database access. But it makes reconstructing the world a lot easier.

Upvotes: 103

transient closure
transient closure

Reputation: 2611

Here's a straightforward way to do this:

First, create a history table for each data table you want to track (example query below). This table will have an entry for each insert, update, and delete query performed on each row in the data table.

The structure of the history table will be the same as the data table it tracks except for three additional columns: a column to store the operation that occured (let's call it 'action'), the date and time of the operation, and a column to store a sequence number ('revision'), which increments per operation and is grouped by the primary key column of the data table.

To do this sequencing behavior a two column (composite) index is created on the primary key column and revision column. Note that you can only do sequencing in this fashion if the engine used by the history table is MyISAM (See 'MyISAM Notes' on this page)

The history table is fairly easy to create. In the ALTER TABLE query below (and in the trigger queries below that), replace 'primary_key_column' with the actual name of that column in your data table.

CREATE TABLE MyDB.data_history LIKE MyDB.data;

ALTER TABLE MyDB.data_history MODIFY COLUMN primary_key_column int(11) NOT NULL, 
   DROP PRIMARY KEY, ENGINE = MyISAM, ADD action VARCHAR(8) DEFAULT 'insert' FIRST, 
   ADD revision INT(6) NOT NULL AUTO_INCREMENT AFTER action,
   ADD dt_datetime DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP AFTER revision,
   ADD PRIMARY KEY (primary_key_column, revision);

And then you create the triggers:

DROP TRIGGER IF EXISTS MyDB.data__ai;
DROP TRIGGER IF EXISTS MyDB.data__au;
DROP TRIGGER IF EXISTS MyDB.data__bd;

CREATE TRIGGER MyDB.data__ai AFTER INSERT ON MyDB.data FOR EACH ROW
    INSERT INTO MyDB.data_history SELECT 'insert', NULL, NOW(), d.* 
    FROM MyDB.data AS d WHERE d.primary_key_column = NEW.primary_key_column;

CREATE TRIGGER MyDB.data__au AFTER UPDATE ON MyDB.data FOR EACH ROW
    INSERT INTO MyDB.data_history SELECT 'update', NULL, NOW(), d.*
    FROM MyDB.data AS d WHERE d.primary_key_column = NEW.primary_key_column;

CREATE TRIGGER MyDB.data__bd BEFORE DELETE ON MyDB.data FOR EACH ROW
    INSERT INTO MyDB.data_history SELECT 'delete', NULL, NOW(), d.* 
    FROM MyDB.data AS d WHERE d.primary_key_column = OLD.primary_key_column;

And you're done. Now, all the inserts, updates and deletes in 'MyDb.data' will be recorded in 'MyDb.data_history', giving you a history table like this (minus the contrived 'data_columns' column)

ID    revision   action    data columns..
1     1         'insert'   ....          initial entry for row where ID = 1
1     2         'update'   ....          changes made to row where ID = 1
2     1         'insert'   ....          initial entry, ID = 2
3     1         'insert'   ....          initial entry, ID = 3 
1     3         'update'   ....          more changes made to row where ID = 1
3     2         'update'   ....          changes made to row where ID = 3
2     2         'delete'   ....          deletion of row where ID = 2 

To display the changes for a given column or columns from update to update, you'll need to join the history table to itself on the primary key and sequence columns. You could create a view for this purpose, for example:

CREATE VIEW data_history_changes AS 
   SELECT t2.dt_datetime, t2.action, t1.primary_key_column as 'row id', 
   IF(t1.a_column = t2.a_column, t1.a_column, CONCAT(t1.a_column, " to ", t2.a_column)) as a_column
   FROM MyDB.data_history as t1 INNER join MyDB.data_history as t2 on t1.primary_key_column = t2.primary_key_column 
   WHERE (t1.revision = 1 AND t2.revision = 1) OR t2.revision = t1.revision+1
   ORDER BY t1.primary_key_column ASC, t2.revision ASC

Upvotes: 249

hostingutilities.com
hostingutilities.com

Reputation: 9509

In MariaDB 10.5+ this is as easy to setup as

CREATE TABLE t (x INT) WITH SYSTEM VERSIONING 
  PARTITION BY SYSTEM_TIME;

Past history can then be queried by doing

SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP '2016-10-09 08:07:06';

There is currently no counterpart for this in MySQL.

See the documentation for more info. If you're on an older version of MariaDB, the documentation has an alternate syntax that has been available since MariaDB 10.3.4.

Upvotes: 3

Ouroboros
Ouroboros

Reputation: 1524

Why not simply use bin log files? If the replication is set on the Mysql server, and binlog file format is set to ROW, then all the changes could be captured.

A good python library called noplay can be used. More info here.

Upvotes: 5

Keethanjan
Keethanjan

Reputation: 363

You could create triggers to solve this. Here is a tutorial to do so (archived link).

Setting constraints and rules in the database is better than writing special code to handle the same task since it will prevent another developer from writing a different query that bypasses all of the special code and could leave your database with poor data integrity.

For a long time I was copying info to another table using a script since MySQL didn’t support triggers at the time. I have now found this trigger to be more effective at keeping track of everything.

This trigger will copy an old value to a history table if it is changed when someone edits a row. Editor ID and last mod are stored in the original table every time someone edits that row; the time corresponds to when it was changed to its current form.

DROP TRIGGER IF EXISTS history_trigger $$

CREATE TRIGGER history_trigger
BEFORE UPDATE ON clients
    FOR EACH ROW
    BEGIN
        IF OLD.first_name != NEW.first_name
        THEN
                INSERT INTO history_clients
                    (
                        client_id    ,
                        col          ,
                        value        ,
                        user_id      ,
                        edit_time
                    )
                    VALUES
                    (
                        NEW.client_id,
                        'first_name',
                        NEW.first_name,
                        NEW.editor_id,
                        NEW.last_mod
                    );
        END IF;

        IF OLD.last_name != NEW.last_name
        THEN
                INSERT INTO history_clients
                    (
                        client_id    ,
                        col          ,
                        value        ,
                        user_id      ,
                        edit_time
                    )
                    VALUES
                    (
                        NEW.client_id,
                        'last_name',
                        NEW.last_name,
                        NEW.editor_id,
                        NEW.last_mod
                    );
        END IF;

    END;
$$

Another solution would be to keep an Revision field and update this field on save. You could decide that the max is the newest revision, or that 0 is the most recent row. That's up to you.

Upvotes: 17

Worthy7
Worthy7

Reputation: 1561

Just my 2 cents. I would create a solution which records exactly what changed, very similar to transient's solution.

My ChangesTable would simple be:

DateTime | WhoChanged | TableName | Action | ID |FieldName | OldValue

1) When an entire row is changed in the main table, lots of entries will go into this table, BUT that is very unlikely, so not a big problem (people are usually only changing one thing) 2) OldVaue (and NewValue if you want) have to be some sort of epic "anytype" since it could be any data, there might be a way to do this with RAW types or just using JSON strings to convert in and out.

Minimum data usage, stores everything you need and can be used for all tables at once. I'm researching this myself right now, but this might end up being the way I go.

For Create and Delete, just the row ID, no fields needed. On delete a flag on the main table (active?) would be good.

Upvotes: 4

goforu
goforu

Reputation: 51

The direct way of doing this is to create triggers on tables. Set some conditions or mapping methods. When update or delete occurs, it will insert into 'change' table automatically.

But the biggest part is what if we got lots columns and lots of table. We have to type every column's name of every table. Obviously, It's waste of time.

To handle this more gorgeously, we can create some procedures or functions to retrieve name of columns.

We can also use 3rd-part tool simply to do this. Here, I write a java program Mysql Tracker

Upvotes: 0

Zenex
Zenex

Reputation: 141

Here is how we solved it

a Users table looked like this

Users
-------------------------------------------------
id | name | address | phone | email | created_on | updated_on

And the business requirement changed and we were in a need to check all previous addresses and phone numbers a user ever had. new schema looks like this

Users (the data that won't change over time)
-------------
id | name

UserData (the data that can change over time and needs to be tracked)
-------------------------------------------------
id | id_user | revision | city | address | phone | email | created_on
 1 |   1     |    0     | NY   | lake st | 9809  | @long | 2015-10-24 10:24:20
 2 |   1     |    2     | Tokyo| lake st | 9809  | @long | 2015-10-24 10:24:20
 3 |   1     |    3     | Sdny | lake st | 9809  | @long | 2015-10-24 10:24:20
 4 |   2     |    0     | Ankr | lake st | 9809  | @long | 2015-10-24 10:24:20
 5 |   2     |    1     | Lond | lake st | 9809  | @long | 2015-10-24 10:24:20

To find the current address of any user, we search for UserData with revision DESC and LIMIT 1

To get the address of a user between a certain period of time we can use created_on bewteen (date1 , date 2)

Upvotes: 12

Related Questions