Reputation: 9997
I would put under version control a big amount of data, i.e. a directory structure (with depth<=5) with hundreds files with size about 500Mb).
The things I need is a system that help me: - to detect if an files has been changed - to detect if files were added/removed - to clone the entire repository in another location - to store a "checkpoint" and restore it later
I don't need sha1 for change detect, something faster is acceptable.
Is git worth for this? There is a better alternative?
Upvotes: 10
Views: 4521
Reputation: 23062
If you're on a unix system (probably are, since you're using git):
That way, you get the benefits of git, you keep whatever tree structure you want, and the large sized files are backed up elsewhere, despite appearing to still be inside the normal folder hierarchy.
Upvotes: 1
Reputation: 1323753
As I mentioned in "What are the Git limits", Git is not made to manage big files (or big binary files for that matter).
Git would be needed if you needed to:
Note: still using Git, you can try this approach
Unfortunately,
rsync
isn't really perfect for our purposes either.
- First of all, it isn't really a version control system. If you want to store multiple revisions of the file, you have to make multiple copies, which is wasteful, or
xdelta
them, which is tedious (and potentially slow to reassemble, and makes it hard to prune intermediate versions), or check them into git, which will still melt down because your files are too big.- Plus rsync really can't handle file renames properly - at all.
Okay, what about another idea: let's split the file into chunks, and check each of those blocks into git separately.
Then git's delta compression won't have too much to chew on at a time, and we only have to send modified blocks...
Based on gzip --rsyncable
, with a POC available in this Git repo.
Upvotes: 10
Reputation: 10967
Unison File Synchroniser is an excellent tool for maintaining multiple copies of large binary files. It will do everything you ask for apart from storing a checkpoint - but that you could do with an rsync hardlink copy.
Upvotes: 1
Reputation: 9711
Maybe something like rsync is better for your needs (if you just want some backups, no concurrency, merge, branching etc.)
Upvotes: 0