Delete all then insert all, or update, delete and insert as needed?

Question

I have a web form that is used to create and update personal information. On save, I collect all the info in a large multidimensional JSON array. When updating the database, the information will potentially consists of three parts. New rows to be created, rows that need to be updated and rows that need to be deleted. These rows will also be across about 5 tables.

My question is this, how should I approach the MySQL queries? My initial thought was to DELETE all the information from all the tables, and do a clean INSERT of all the new information in one go. I guess the other approach would be to do 3 queries: UPDATE all those with an existing ID; DELETE all those marked for deletion and INSERT all the newly created data (data without existing ID's).

Which of these approaches would be best, or is there a better way of doing this? Thanks for any advice. I appreciate it.

inquisitive · Accepted Answer

delete all and insert all should NEVER be practiced.

reasons:

Too much costly. mostly user performs edit. so for what was just a few update, you did one delete and a hundred inserts.
plays havoc with on-delete-cascade foreign keys.
upsets auto-increment fields even when they were apparently not touched.

you need to implement unit-of-work. I dont know which language you are working with, but some of the languages have an inbuilt support for that. in dot-net we have DataSets.

Basics:

Keep track of each record you fetched from database. secretly maintain a flag for each record to note which were loaded-from-db (ie. untouched), which has modifications (needs update query) and which are added new. for the deleted records, maintain a separate list (maybe of their IDs). How to achieve this feat is matter of separate discussion.
When user clicks Save, start a database transaction. this is not strictly part of current discussion, but is almost always done in similar conditions.
In the transaction, first loop through the deleted items array. fire a delete query for each of them.
Then loop through the modified items array. for each modified item you may simply update all of its columns to the latest values. if the numer of columns is too large (>30) then things change a bit.
then comes the newly created items. fire one insert for each of them.
Finally commit the transaction.

if the language you are programming in supports try/catch blocks then perform all of the above steps (after begining transaction) in try/catch. in catch block rollback the transcation.

this approach looks more complicated and seems to fire more queries than the simple delete/insert/all approach but trust me we have been there, done that and then spent sleeples nights undoing all that was done. never go the delete/insert way unless you can really justify it.

on how to do the change-tracking thing, it depends a lot on language and type of application you are using. even for dot-net the approach differs for desktop applications and web applications. tracking deletions is easy. so as tracking new insertions. the update marks are applied by trapping the edit event on any of the columns of that field.

EDIT

The data spans about five tables. hence the three loops (delete/update/insert) has to be done five times, one for each table. first draw the relationships among the tables. process the top table first. then process the tables which are directly connected to the top level tables and so on. if you have a cyclic relationship among the tables then you have to be specially careful.

The code against the Save operation is about to grow quite long. 5x3=15 operations, each with its own sql. none of these operations are expected to be reusable hence putting them in separate methods is futile. everything is about to go in a large procedural block. hence religiously comment the code. mark the table boundaries and the operations.

Delete all then insert all, or update, delete and insert as needed?

Answers (2)

EDIT

Related Questions