Reputation: 1168
I want want to backup the database at high frequency, but the cost of full backup is not acceptable. It seems SQLite do not have API to do incremental backup directly. But I found there's a Data Change Notification Callbacks which seems helpful. https://www.sqlite.org/c3ref/update_hook.html The parameters of the callbacks are type of operation, name of datebase, name of table and rowID. However, I don't know is it possible to generate backup information (An SQL statement, for example)for this row of date by these information, without knowing the detailed information of table. Which means, is there a common method to generate the backup for this operation for tables with different structure? I know there are some exceptions that the callback won't be called, and I think this is acceptable if I do full backup periodically.
Upvotes: 12
Views: 7711
Reputation: 83
I have Created a nodejs package sqlite3-incremental-backup based on the approach by @user3893988. You can use it as a module as follows:
import {backup, restore} from './sqlite3-incremental-backup';
const srcFile = 'source.db';
const targetFile = 'target.db';
const snapshotName = 'snapshot1.txt'; //Can be any arbitrary name. MUST BE UNIQUE FOR EACH SNAPSHOT OTHERWISE THE PREVIOUS WILL BE LOST
backup(srcFile, snapshotName); // For Backup
restore(snapshotName, targetFile); // For Restoration
The python module is coming soon.
Upvotes: 2
Reputation: 410
I’ve been thinking about this idea, it’s not officially endorsed by SQLite but in theory it sounds very plausible.
The SQLite backup api essentially lets you get a working snapshot of the live database file. Similarly the VACCUM INTO let’s you update an existing backup database file. https://www.sqlite.org/lang_vacuum.html#vacuuminto
This is just good ol’ backup, we want an incremental backup (kinda like git)
Suppose we want to backup the database every hour and it’s 1GB database with relatively few writes, storing 24GB per day sounds like an overkill.
We can take advantage of SQLite file format which is essentially a fixed 100 byte header + (page_size * num_pages). SQLite will always write around the page boundaries. The page_size and num_pages are stored in the 100 byte header. See the storage spec https://www.sqlite.org/fileformat.html
So what we can do is create a reference file which is simply a file with a list of hashes. Say we use sha256 (which is what new git version uses) so it will be a file (e.g backups/2020-02-22-19-12-00.txt)
sha256(header)
sha256(page1)
sha256(page2)
sha256(page3)
And we store the corresponding pages as individual files like git does in an objects dir.
E.g objects/ab/cdef12343..
The first two letters are used as directory name so we don’t have too many files in a directory.
Alternatively you can just upload the page files to any of the cloud storage providers e.g GCS, S3, Azure Blobs, DO spaces. This could give multi region backup.
Since we’re not storing duplicate copies of pages, the total file size of all backups is pretty small compared to the database_size * num_backups.
You can even use the hashes file as a way to sync/restore SQLite file. This is how Dropbox/rsync syncs files. The hashes file tells us what pages changed and we only download the changed objects and update those ranges in the file.
Upvotes: 12