Reputation: 204
I am abusing GIT to use it, locally, as an incremental backup solution. In part to teach me git, but in part to combat JPG and MP3 file corruption, which happens once in a blue moon.
The repo gets huge, obviously. I need to purge non-existent files from history. (I have a lot of security videos that go into the system automatically, but also get deleted later, and I don't need a fully checked in video feed of my front yard in my .git folder.)
This is a matter of abusing the tool in the "right" way -- I don't mind wasting a lot of space for the files I have; I don't mind a file having 100 versions if it's a file that exists. But if it doesn't exist, I want it out of the repo, with no way to ever bring it back; erased from history completely.
Upvotes: 1
Views: 4135
Reputation: 488183
This is indeed fairly severe abuse of the tool. It would probably be better to figure out what is corrupting the original files. All Git will be really giving you here is content checksumming, which you can do outside Git ... or inside Git, with less-severe abuse, by using a data structure other than the usual chain of commits.
In other words, if you want to do this to learn how to use Git the wrong way :-) I think there is a "better wrong way". Here is my suggestion:
Make each commit on a new, orphan branch. You can do this with git checkout -b --orphan
or by using the "plumbing" tools git write-tree
and git commit-tree
.
Each branch is to contain one and only one commit. (If you are using the plumbing tools, you can use tags instead of branches.)
Then, to delete a backup (the whole thing), simply delete the branch (or tag) name.
Diagrammatically, instead of:
o--o--o--...--o--o <-- master
^ ^
| \
| the most recent
|
an hour ago, or yesterday, or whatever
your commits will be:
o <-- backup-20160508T101112.13
o <-- backup-20160508T131415.16
...
These names are more or less ISO-date-format, YYYYMMDDTHHMM.SS; but you may use any names that make the most sense to you.
Note that if two backups commit the same files, they re-use all the underlying Git "blob" objects, so two backups take basically the same space as one backup. Removing one of these two backups (by deleting the branch or tag name) has no effect since all those files are referred-to by the other backup.
If one file (xyz.txt
) is slightly different, Git will delta-compress it against another file (in any other commit) in Git's usual way: the commits need not be joined by parent/child relationships. Note that image and movie files rarely compress well in Git anyway (because they already compressed: information theory says that if the first compression was any good, the second attempt will not help).
Now let's say you decide you no longer need to back up file foo.jpg
. Just remove it: it will expire and be garbage-collected once the oldest backup is from "now". It's true that removed files will remain in older backups, but only for as long as you keep those backups.
Upvotes: 1
Reputation: 142094
use the ls-tree HEAD
to get the files in your
and then remove the files which are not there nay more with the
https://github.com/rtyley/bfg-repo-cleaner
It the prefect tool for this kind of task
BFG Repo-Cleaner
an alternative to git-filter-branch.
The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history:
- Removing Crazy Big Files
- Removing Passwords, Credentials & other Private data
In all these examples bfg is an alias for java -jar bfg.jar.
# Delete all files named 'id_rsa' or 'id_dsa' :
bfg --delete-files id_{dsa,rsa} my-repo.git
After you have cleaned your repository use this tool to store your large files.
Upvotes: 1
Reputation: 164809
There's two good tools for this problem. BFG Repo Cleaner can delete large files from history. Git Large File Storage, aka git-lfs, lets you put large files in Git without bloating your repository size.
Put them together and you can use BFG to change old commits of large files to use gif-lfs with the new --convert-to-git-lfs
option. Then use git-lfs for future commits of large files.
Upvotes: 1