user2742409
user2742409

Reputation: 315

Data provisioning process Insert from multiple tables in a single table

i'm building a process in a database where multiple tables are used to write data in a single table (each in a single process). The process should be fault tolerant - in case of an error all the inserted data should be rolled back. In my opinion there are two ways of implementing this kind of logic overall database systems.

  1. At the beginning of one process i lock the target db and all the other processes have to wait until this process is finished (no real parallel processing possible)
  2. Each process write the data in a tmp table and at the end they can bulk load all the data in the target. (keeping track of which data has been exported already etc)

Are there maybe more / better options for this kind or problem?

Cheers

Upvotes: 0

Views: 96

Answers (1)

Ashutosh A
Ashutosh A

Reputation: 1025

Given the number of rows (1m) and the fact that you'll be running the process every 5 mins, I don't think either of the approaches you mentioned would suit -

  • The first approach would lock the target table too often and for long
  • The second approach would insert a huge amount of data and then either commit or rollback at the end. Depending on the DB you use - such a huge volume of uncommitted data could cause performance issues. A rollback at the end would also put a lot of load on the DB.

What you could try is -

  1. Insert and commit data in the target table in batches (of say 5K records - you'll need to tune this after testing for various batch sizes)

  2. The challenge is - no-one should be able to use the committed data until the whole process finishes successfully. To address this,

    • Create a table called process_stats (process_id, status)
    • Everytime you start a process, insert an entry in this table with status as 'STARTED'. When the process ends, change the status to either 'SUCCESSFUL' or 'FAILED'
    • Refer to this table from the target table (so add a column to the target table called process_id)
    • Lastly, any query that reads the target table should join the target table with the process_stats table and select records only if the process status is 'SUCCESSFUL'
    • If a process fails, you could optionally delete corresponding records from the target table

This would make sure that -

  1. You commit data in smaller batches
  2. No big commits or rollbacks
  3. No significant locks/pauses during loading

Cons -

  1. All queries that read the target table need an additional join

Upvotes: 1

Related Questions