Ben
Ben

Reputation: 52893

Maintaining logical consistency with a soft delete, whilst retaining the original information

I have a very simple table students, structure as below, where the primary key is id. This table is a stand-in for about 20 multi-million row tables that get joined together a lot.

+----+----------+------------+
| id |   name   |    dob     |
+----+----------+------------+
|  1 | Alice    | 01/12/1989 |
|  2 | Bob      | 04/06/1990 |
|  3 | Cuthbert | 23/01/1988 |
+----+----------+------------+

If Bob wants to change his date of birth, then I have a few options:

  1. Update students with the new date of birth.

    Positives: 1 DML operation; the table can always be accessed by a single primary key lookup.

    Negatives: I lose the fact that Bob ever thought he was born on 04/06/1990

  2. Add a column, created date default sysdate, to the table and change the primary key to id, created. Every update becomes:

    insert into students(id, name, dob) values (:id, :name, :new_dob)
    

    Then, whenever I want the most recent information do the following (Oracle but the question stands for every RDBMS):

    select id, name, dob
      from ( select a.*, rank() over ( partition by id 
                                           order by created desc ) as "rank"
               from students a )
     where "rank" = 1
    

    Positives: I never lose any information.

    Negatives: All queries over the entire database take that little bit longer. If the table was the size indicated this doesn't matter but once you're on your 5th left outer join using range scans rather than unique scans begins to have an effect.

  3. Add a different column, deleted date default to_date('2100/01/01','yyyy/mm/dd'), or whatever overly early, or futuristic, date takes my fancy. Change the primary key to id, deleted then every update becomes:

    update students x
       set deleted = sysdate 
     where id = :id
       and deleted = ( select max(deleted) from students where id = x.id );
    insert into students(id, name, dob) values ( :id, :name, :new_dob );
    

    and the query to get out the current information becomes:

    select id, name, dob
      from ( select a.*, rank() over ( partition by id 
                                           order by deleted desc ) as "rank"
               from students a )
     where "rank" = 1
    

    Positives: I never lose any information.

    Negatives: Two DML operations; I still have to use ranked queries with the additional cost or a range scan rather than a unique index scan in every query.

  4. Create a second table, say student_archive and change every update into:

    insert into student_archive select * from students where id = :id;
    update students set dob = :newdob where id = :id;
    

    Positives: Never lose any information.

    Negatives: 2 DML operations; if you ever want to get all the information ever you have to use union or an extra left outer join.

  5. For completeness, have a horribly de-normalised data-structure: id, name1, dob, name2, dob2... etc.

If number 1 is not an option if I never want to lose any information and always do a soft delete. Number 5 can be safely discarded as causing more trouble than it's worth.

I'm left with options 2, 3 and 4 with their attendant negative aspects. I usually end up using option 2 and the horrific 150 line (nicely-spaced) multiple sub-select joins that go along with it.


tl;dr I realise I'm skating close to the line on a "not constructive" vote here but:

What is the optimal (singular!) method of maintaining logical consistency while never deleting any data?

Is there a more efficient way than those I have documented? In this context I'll define efficient as "less DML operations" and / or "being able to remove the sub-queries". If you can think of a better definition when (if) answering please feel free.

Upvotes: 3

Views: 318

Answers (1)

a1ex07
a1ex07

Reputation: 37382

I'd stick to #4 with some modifications.No need to delete data from original table ; it's enough to copy old values to archive table before updating(or before deleting) original record. That's can be easily done with row level trigger. Retrieving all information in my opinion is not a frequent operation, and I don't see anything wrong with extra join /union. Also, you can define a view , so all queries will be straightforward from end user perspective.

Upvotes: 1

Related Questions