Reputation: 22213
I'm trying to setup a SQL server database for an ASP.NET MVC site that will both store the latest information as well as a history of all the changes to the data.
The information for my site comes from XML files which are uploaded by the user. The site parses through the XML and writes the contained information into the sites database. Elements in successive uploads may actually represent the same thing, but some data may have changed. Yet like I said before I want to keep track of every version.
The table bellow shows one way I was thinking of approaching this. I would create duplicate records for each item in each upload. New items that match those in previous uploads will be assigned the same ID, but each item would be assigned to a unique upload ID.
Upload 1: Erik and Sara are added
Upload 2: Erik renamed to Eric, Bill added
Upload 3: Sarah grew 2" taller, Eric was removed.
[PERSONS TABLE]
PersonID Name Height UploadID
1 Erik 71 1
1 Eric 71 2
2 Sarah 70 1
2 Sarah 70 2
2 Sarah 72 3
3 Bill 76 2
3 Bill 76 3
[UPLOADS TABLE]
UploadID UploadTime
1 3/09/2011
2 4/01/2011
3 4/11/2011
However, this doesn't seem like the optimal solution to me because of how much information ends up being duplicated in the database. Is there a better way to approach this where only the changes are saved with each upload?
Upvotes: 0
Views: 187
Reputation: 6179
I think the problem is is that your PERSONS table no longer contains just information about PERSONS. It also contains information on the updloads. What I'm going to recommend probably won't decrease the size of your database; but it will make it a little easier to understand and work with.
PERSONS
PersonID, Name, Height
1 Eric 71
2 Sarah 72
3 Bill 76
UPLOAD
UploadID, UploadTime
1 3/09/2011
2 4/01/2011
3 4/11/2011
PERSONS_EDIT
PersonID, UploadID, ChangeSQL, ChangeDescription
1 1 "insert into PERSONS(Name, Height) VALUES('Erik', 71)" "Erik added"
1 2 "update PERSONS set name='Eric' where Name='Erik'" "Changed Erik's name"
.... ... ...... ....
I don't think you can do much beyond this to make your tables simpler or your database smaller. As you can see, your PERSONS_EDIT table is going to be your largest table. The database you're using might provide mechanisms to do this automatically (some sort of transaction recording or something) but I've never used anything like that so I'll leave it to other people on Stackoverflow to make any suggestions like that if they exist. If the PERSONS_EDIT table gets too large, you can look at deleting entries that are over a week/month/year old. The decision on when to do that would be up to you.
Some other reasons for making this change, in your first table, you had to use PersonId and UploadID as a primary key to your persons table. So, to actually get the most recent version of a PERSON within your application, you would have had to do something where you select person by id, and then order by their UploadId and select the one with the largest upload Id EVERY TIME YOU DO A TRANSACTION ON ONE PERSON.
Another benefit is that you don't have to do a bunch of fancy sql to get your edit history. Just do a select * from the PERSONS_EDIT table.
Upvotes: 1