Calvin Lee
Calvin Lee

Reputation: 43

Effective ways to avoid skipping a record

Current Scenario: There is approximately 4.3 million records in the database and I have to migrate the file to an external file record by record. During the migration data will be manipulated a certain way (cant specify) . The reason for the record by record is due to the functionality of the program, it will keep running even it reaches till the end of file and waits for new records to be added.

Is there a way to prevent duplicates during middle of transfer, I am also adding a failsafe in case the program crashes half way so it can resume form where it left off. This condition of saving last location is only hit if the record is successfully saved.

As An example logic:


| Record 1 | <---- Success

| Record 2 | <---- Success

| Record 3 | <---- Success

| Record 4 | <---- Pending

| Record 5 | <---- Pending

| Record 6 | <---- Success (Newest)

The reason for the pending could potentially happen since it might have been a larger record and have to take some time.

Assuming the program crashes after record 6 and saving the last know location at record 6, how can I checked if I had missed record 4, 5 or any number of record prior to crash. After resuming the program would continue from record 6 onwards, completely missing those missed before hand.

I want to ask the great minds of Stackoverflow is there a theoretical solution to solve this issue. Just keep in mind if the program crashes, everything in the memory would assume to be wiped out. But I will save the last successful read in another file so that will be safe from crash.

ps. this is not sql, so im trying a more manual approach.

Upvotes: 0

Views: 50

Answers (1)

eMi
eMi

Reputation: 5618

If you worry about potential crashes, a good approach which I used recently for a similiar use case is to maintain a log file (e.g. records_processed) where you note down every record's unique ID once it's sucessfully migrated. If your program starts up after a crash, check the last few entries in that log to see where you were before the crash. When doing this, you can check some previous records and be sure none were missed due to being "pending" at the time of the crash. For to avoid duplicates to the destination, just check the record's ID against your log file. If possible and to have a better efficiency , keep the recent record Id in memory so you dont have to always read the file. Once in a while / periodically you might want to save this in-memory list to a separate other file and sometimes merge it back with the main log to keep things clean. With that approach it would hep that no record is 1. left behind and 2. duplicated, even in the event of a crash. I hope this helps and I understood the question correctly.

Upvotes: 1

Related Questions