Reputation: 97008
To remove a large unwanted file from all of git history you can use filter-branch
to rewrite the index (the list of files in the repo) of each commit so the file was never added.
git filter-branch --index-filter "git rm --cached --ignore-unmatch path/to/offending_file.wav" --tag-name-filter cat -- --all
But what if I want to keep the file but make it a lot smaller (e.g. imagine if an icon was accidentally stored as a huge image). I tried this approach:
First add a replacement file to git's database
HASH=`git hash-object -w /tmp/replacement.png`
Also note the file we want to replace
FILE="path/to/icon.png"
Now filter the index as follows: first check the file exists at this commit:
git cat-file -e :"$FILE"
If so remove it from the index:
git rm --cached "$FILE"
And finally add a reference to our replacement with the same filename.
git update-index --add --cacheinfo "100644,$HASH,$FILE"
Putting it all together:
git filter-branch --index-filter "if git cat-file -e :$FILE ; then git rm --cached $FILE ; git update-index --add --cacheinfo 100644,$HASH,$FILE ; fi" --tag-name-filter cat -- --all
This seems to work and doesn't print any errors that are too scary. However, no matter how many git gc
and prune commands I try the original blob still exists in the repository. Even if I clone the repo to a new place it still exists.
I suspect it is because the remote refs, and the original
refs which filter-branch
creates still point to the old tree, so the original file is still referenced.
I did try removing them all with a hack like this:
for REF in `git show-ref | cut -c 42- | grep original` ; do git update-ref -d $REF ; done
And the same for remotes
, but the blob is still there.
So my questions:
originals
refs (and maybe the remotes) - including all branches and tags?Upvotes: 1
Views: 109
Reputation: 97008
Aha I've done it! I think.
Here are the extra steps. First it's a good idea to note the hash of the blob you want at the start so you can check if it exists with
git cat-file -t 949abcd....
Ok so first I cleared the reflog, since it still has a reference to the original clone:
git reflog expire --expire=now --all
Next I removed the origin remote, since it still has a reference to the original tree. I guess if you push the new hashes (probably need to force push) then this step will be unnecessary and the file should be eventually GCed anyway.
git remote rm origin
Next I removed the original
refs (that filter-branch
creates). I didn't find a less hacky way:
for REF in `git show-ref | cut -c 42- | grep original` ; do git update-ref -d $REF ; done
Finally, garbage collect. I'm not sure whether --aggressive
is required but --prune=now
definitely is because otherwise git gc
only garbage collects old unwanted objects, for safety.
git gc --aggressive --prune=now
After all these steps git cat-file
reports that the blob is gone! I haven't experimented with pushing the result back to origin (after you re-add it), and I'm not 100% sure which of the above steps are necessary, but this seemed to work so far.
Upvotes: 1