Reputation: 41
I am trying to make backup copies of both dump.rdb every X time and appendonly.aof every Y time so if the files get corrupted for whatever reason (or even just AOF's appendonly.aof file) I can restore my data from the dump.rdb.backup snapshot and then whatever else has changed since with the most recent copy of appendonly.aof.backup I have.
I backup dump.rdb every 5 minutes, and backup appendonly.aof every 1 second.
1) Since dump.rdb is being written in the background into a temporary file by a child process - what happens to the key changes that occurs while the child process is creating a new image? I know the AOF file will keep appending regardless of the background write, but does the new dump.rdb file contain the key changes too?
2) If dump.rdb does NOT contain the key changes, is there some way to figure out the exact point where the child process is being forked? That way I can keep track of the point after which the AOF file would have the most up to date information.
Thanks!
Upvotes: 4
Views: 3570
Reputation: 73246
Usually, people use either RDB, either AOF as a persistence strategy. Having both of them is quite expensive. Running a dump every 5 min, and copying the aof file every second sounds awfully frequent. Except if the Redis instances only store a tiny amount of data, you will likely kill the I/O subsystem of your box.
Now, regarding your questions:
1) Semantic of the RDB mechanism
The dump mechanism exploits the copy-on-write mechanism implemented by modern OS kernels when the clone/fork processes. When the fork is done, the system just creates the background process by copying the page table. The pages themselves are shared between the two processes. If a write operation is done on a page by the Redis process, the OS will transparently duplicate the page (so than the Redis instance has its own version, and the background process the previous version). The background process has therefore the guarantee that the memory structures are kept constant (and therefore consistent).
The consequence is any write operation started after the fork will not be taken in the dump. The dump is a consistent snapshot taken at fork time.
2) Keeping track of the fork point
You can estimate the fork timestamp by running the INFO persistence command and calculating rdb_last_save_time - rdb_last_bgsave_time_sec, but it is not very accurate (second only).
To be a bit more accurate (millisecond), you can parse the Redis log file to extract the following lines:
[3813] 11 Apr 10:59:23.132 * Background saving started by pid 3816
You need at least the "notice" log level to see these lines.
As far as I know, there is no way to correlate a specific entry in the AOF to the fork operation of the RDB (i.e. it is not possible to be 100% accurate).
Upvotes: 2