pankaj kushwaha
pankaj kushwaha

Reputation: 101

What will happen if power get shutdown , while we are inserting into database?

I was recently asked a Question in an interview , if anyone can help me to figure out.

Suppose we have 100 files , and a process read a file , parse it , and write data into a database.

Now lets say process was at file number 60 and power got off , Now how will you design a system such that when power comes up , process should start write data into database , where it left before shut down.

Upvotes: 3

Views: 594

Answers (2)

Scratte
Scratte

Reputation: 3166

This would be one way:

  • Loop over:
  1. Pick up a file
    • Check it hasn't been processed with a query to the database.
  2. Process the file
    • Update the database
    • Update the database with a log of the file processed
  3. Commit
  4. Move the file out of the non-processed queue

You can also log the file entry to some other persistent resource.


Q. What if there are many files. Doesn't writing to logs slow down the process?
A: Probably not much, it's just one entry into the database per file. It's the cost of resilience.

Q: What if the files are so small it's almost only updating one row per file?
A: Make your update query idempotent. Don't log, but ensure that files are removed from the queue once the transaction is complete.

Q: What if there are many lines in a file. Do you really want to restart with the first line of a file?
A: Depends on the cost/benefit. You could split the file into smaller ones prior to processing each sub-file. If the power out happens all the time, then that's a good compromise. If it happens very rarely, the extra work by the system may not be worth it.

A: What if there is a mix of small and large files?
Q: Put the files into separate queues that handle them accordingly.


The UPS idea by @TimBiegeleisen is very good, though:

Well actually it is about that, because unplugging a database in the middle of a lengthy transaction might result in corrupted data. – Tim Biegeleisen Feb 22 '20 at 10:24

I've experienced failure of one such, so you'll need two.

Upvotes: 2

Konrad
Konrad

Reputation: 21

I think you must:

  1. Store somewhere a reference to a file (ID, index of processed file - depend on the case really).
  2. Your have to define the bounduaries of a single transaction - let it be full processing of one file so: read a file, parese it, store data to the database and update reference to the file you processed. If all of that succeeds you can commit the transaction to the database.
  3. You main task which will process all the files should look into reference table and based on it's state featch next file.

In this case you create transaction around single file processing. If anything goes wrong there, you can always rerun the processing job and it will start where it left off.

Please be aware that this is very simple exaple in most scenarios you want to keep transactions as thin as possible.

Upvotes: 1

Related Questions