Reputation: 3643
Most of the git filter-branch examples I've seen that are removing files have been to remove files based on filename. I don't necessarily want to do that. Instead, I've identified a number of blob (not commit) SHA1s of the files I want to remove, regardless of where they are in the repository. (Due to our repo history, files tend to move around a bunch without changing.)
What's the best way to tell git filter-branch to remove files based on their blob SHA1?
Upvotes: 4
Views: 3485
Reputation: 1588
git filter branch --index-filter
puts iteratively each commit into the index so it is possible to recover the filename from the hash with git ls-files -s
.
I do this to remove blobs with hashes 2d341f0223ff, 6a4558fa76d1 and 4d0a90cba061:
git filter-branch --force --index-filter "git ls-files -cdmo -s | grep ' 2d341f0223ff\| 6a4558fa76d1\| 4d0a90cba061' | awk '{print $4}' | xargs git rm --cached --ignore-unmatch 656565randomstring546464" --prune-empty --tag-name-filter cat -- --all
The random string is to avoid that git rm
raises an error when grep
returns no match.
Upvotes: 1
Reputation: 45819
As noted by @RobertTyley in his answer, you're probably better off using BFG. However, to answer the question as asked (how to do this with filter-branch
):
There isn't a great way unfortunately. You can write a script to get all filenames associated with the SHA value in the index. As a starting point, if you're removing a file with hash DEADC0DE
git rev-list -n 1 --objects HEAD |grep ^DEADC0DE |cut -c 42-
You'd then feed each line (perhaps with xargs
?) as the <filename>
in
git rm --cached <filename>
And you'd use that script as your index-filter
value (because using it as a tree filter will just make an already slow approach even slower).
Upvotes: 0
Reputation: 177825
The filter-branch version could look something like this inside of index-filter:
git ls-files -s |
sed -r '/ 02c97746d64fbfe13007a1ab4e9b9e4bbd99f42f /s/^100(644|755)/0/' |
git update-index --index-info
That is, read the index-info format, find the interesting blob and set the mode to 0 (marking it for removal), then write that back to the index.
Upvotes: 0
Reputation: 25314
Your task is to remove blobs from Git history by a hash identifier. You may find it faster and easier to use the BFG rather than git-filter-branch
, specifically using the --strip-blobs-with-ids
flag:
-bi, --strip-blobs-with-ids
<blob-ids-file>
...strip blobs with the specified Git object ids
Carefully follow the usage instructions, the core part is just this:
$ java -jar bfg.jar --strip-blobs-with-ids <blob-ids-file> my-repo.git
Note that the <blob-ids-file>
file should contain Git object ids, rather plain SHA-1 hashes of the blob's contents.
For a given file, you can calculate the Git object id with git hash-object
:
$ git hash-object README.md
a63b49c2e93788cd71c81015818307c7b70963bf
You can see that this value is different to a simple SHA-1 hash:
$ sha1sum README.md
7b833f7b37550e2df719b57e8c4994c93a865aa9 README.md
...that's because the Git object id hashes a Git header, along with the contents of the file, even though it does use the same SHA-1 algorithm.
The BFG is typically at least 10-50x faster than running git-filter-branch
, and generally easier to use.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
Upvotes: 8