Maintaining logical consistency with a soft delete, whilst retaining the original information

Question

I have a very simple table students, structure as below, where the primary key is id. This table is a stand-in for about 20 multi-million row tables that get joined together a lot.

+----+----------+------------+
| id |   name   |    dob     |
+----+----------+------------+
|  1 | Alice    | 01/12/1989 |
|  2 | Bob      | 04/06/1990 |
|  3 | Cuthbert | 23/01/1988 |
+----+----------+------------+

If Bob wants to change his date of birth, then I have a few options:

Update students with the new date of birth.

Positives: 1 DML operation; the table can always be accessed by a single primary key lookup.

Negatives: I lose the fact that Bob ever thought he was born on 04/06/1990
Add a column, created date default sysdate, to the table and change the primary key to id, created. Every update becomes:
```
insert into students(id, name, dob) values (:id, :name, :new_dob)
```
Then, whenever I want the most recent information do the following (Oracle but the question stands for every RDBMS):
```
select id, name, dob
  from ( select a.*, rank() over ( partition by id 
                                       order by created desc ) as "rank"
           from students a )
 where "rank" = 1
```
Positives: I never lose any information.

Negatives: All queries over the entire database take that little bit longer. If the table was the size indicated this doesn't matter but once you're on your 5th left outer join using range scans rather than unique scans begins to have an effect.
Add a different column, deleted date default to_date('2100/01/01','yyyy/mm/dd'), or whatever overly early, or futuristic, date takes my fancy. Change the primary key to id, deleted then every update becomes:
```
update students x
   set deleted = sysdate 
 where id = :id
   and deleted = ( select max(deleted) from students where id = x.id );
insert into students(id, name, dob) values ( :id, :name, :new_dob );
```
and the query to get out the current information becomes:
```
select id, name, dob
  from ( select a.*, rank() over ( partition by id 
                                       order by deleted desc ) as "rank"
           from students a )
 where "rank" = 1
```
Positives: I never lose any information.

Negatives: Two DML operations; I still have to use ranked queries with the additional cost or a range scan rather than a unique index scan in every query.
Create a second table, say student_archive and change every update into:
```
insert into student_archive select * from students where id = :id;
update students set dob = :newdob where id = :id;
```
Positives: Never lose any information.

Negatives: 2 DML operations; if you ever want to get all the information ever you have to use union or an extra left outer join.
For completeness, have a horribly de-normalised data-structure: id, name1, dob, name2, dob2... etc.

If number 1 is not an option if I never want to lose any information and always do a soft delete. Number 5 can be safely discarded as causing more trouble than it's worth.

I'm left with options 2, 3 and 4 with their attendant negative aspects. I usually end up using option 2 and the horrific 150 line (nicely-spaced) multiple sub-select joins that go along with it.

tl;dr I realise I'm skating close to the line on a "not constructive" vote here but:

What is the optimal (singular!) method of maintaining logical consistency while never deleting any data?

Is there a more efficient way than those I have documented? In this context I'll define efficient as "less DML operations" and / or "being able to remove the sub-queries". If you can think of a better definition when (if) answering please feel free.

Maintaining logical consistency with a soft delete, whilst retaining the original information

Answers (1)

Related Questions