Reputation: 265
So often my applications want to save files to load again later. Having recently got unlucky with a crash, I want to write the operation in such a way that I am guaranteed to either have the new data, or the original data, but no a corrupted mess.
My first idea was to do something along the lines of (to save a file called example.dat):
Then at load time the application can follow the following rules:
However, having done a little research, I found that as well as OS caching which I may be able to override with the file flush methods, some disk drives still then cache internally and may even lie to the OS saying they are done, so 4. could complete, the write is not actually written, and if the system goes down I have lost my data...
I am not sure the disk problem is actually solvable by an application, but are the general rules above the correct thing to do? Should I keep an old recovery copy of the file for longer to be sure, what are the guidelines regarding such things (e.g. acceptable disk usage, should the user choose, where to put such files, etc.).
Also how should I avoid potential conflict the user and other programs for "example.dat.tmp". I recall seeing a "~example.dat" sometimes from some other software, is that a better convention?
Upvotes: 6
Views: 2923
Reputation: 154047
If the disk drives report back to the OS that the data is physically on the disk, and it's not, then there's not much you can do about it. A lot of disks do cache a certain number of writes, and report them done, but such disks should have a battery backup, and finish the physical writes no matter what (and they won't loose data in case of a system crash, since they won't even see it).
For the rest, you say you've done some research, so you no doubt
know that you can't use std::ofstream
(nor FILE*
) for this;
you have to do the actual writes at the system level, and open
the files with special attributes for them to ensure full
synchronization. Otherwise, the operations can stick around in
the OS buffering for a while. And that as far as I know,
there's no way of ensuring such synchronization for a rename
.
(But I'm not sure that it's necessary, if you always keep two
versions: my usual convention in such cases is to write to
a file "example.dat.new"
, then when I'm done writing, delete
any file named "example.dat.bak"
, rename "example.dat"
to
"example.dat.bak"
, and then rename "example.dat.new"
to
"example.dat"
. Given this, you should be able to figure out
what did or did not happen, and find the correct file
(interactively, if need be, or insert an initial line with the
timestamp).
Upvotes: 3
Reputation: 5722
You should lock the actual data file while you write its substitute, if there's a chance that a different process could be going through the same protocol that you are describing.
You can use flock
for the file lock.
As for your temp file name, you could make your process ID part of it, for instance "example.dat.3124," No other simultaneously-running process would generate the same name.
Upvotes: 0