0xsegfault
0xsegfault

Reputation: 3161

Understanding why git-filter-branch is not cleaning my history

I used gitleaks to check for leaked secret in my repos history. When I ran the following command and forced the push

git filter-branch --force --index-filter \
  'git rm -r --cached --ignore-unmatch terra/fixtures.go' \
  --prune-empty --tag-name-filter cat -- --all

it seemed to work, except I noticed the following:

WARNING: Ref 'refs/heads/automate_tests' is unchanged
WARNING: Ref 'refs/heads/ethRawTransaction' is unchanged
WARNING: Ref 'refs/heads/feature/177/leave-bastion' is unchanged
WARNING: Ref 'refs/heads/feature/FAQ' is unchanged
WARNING: Ref 'refs/heads/master' is unchanged
WARNING: Ref 'refs/heads/mjolnir' is unchanged
WARNING: Ref 'refs/heads/tmp' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
WARNING: Ref 'refs/remotes/origin/automate_tests' is unchanged
WARNING: Ref 'refs/remotes/origin/bug/0.0.11-beta-fix' is unchanged
WARNING: Ref 'refs/remotes/origin/bug/bastion-ssh' is unchanged
WARNING: Ref 'refs/remotes/origin/bug/fix-examples-merge' is unchanged
WARNING: Ref 'refs/remotes/origin/develop' is unchanged
WARNING: Ref 'refs/remotes/origin/ethRawTransaction' is unchanged
WARNING: Ref 'refs/remotes/origin/feature/168/auto-ssh-to-bastion' is unchanged
WARNING: Ref 'refs/remotes/origin/feature/169/ethstats_for_pantheon' is unchanged
WARNING: Ref 'refs/remotes/origin/feature/175/ssh-to-certain-nodes' is unchanged
WARNING: Ref 'refs/remotes/origin/feature/176/tagging-nodes-to-ips' is unchanged
WARNING: Ref 'refs/remotes/origin/feature/177/leave-bastion' is unchanged
WARNING: Ref 'refs/remotes/origin/feature/FAQ' is unchanged
WARNING: Ref 'refs/remotes/origin/feature/README' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
WARNING: Ref 'refs/remotes/origin/mjolnir' is unchanged
WARNING: Ref 'refs/remotes/origin/tmp' is unchanged
WARNING: Ref 'refs/tags/0.0.4' is unchanged
WARNING: Ref 'refs/tags/20190820141131-866368a' is unchanged
WARNING: Ref 'refs/tags/20190820142202-bd96767' is unchanged
WARNING: Ref 'refs/tags/20190820143451-fc7f46a' is unchanged
WARNING: Ref 'refs/tags/20190820143903-832818a' is unchanged
WARNING: Ref 'refs/tags/20190820150546-05e3105' is unchanged
WARNING: Ref 'refs/tags/20190820154631-da0cdab' is unchanged
WARNING: Ref 'refs/tags/20190820160956-047caa6' is unchanged
WARNING: Ref 'refs/tags/20190820162243-a300fa5' is unchanged
WARNING: Ref 'refs/tags/20190820170410-47f8878' is unchanged
WARNING: Ref 'refs/tags/untagged-f148f02c4d71ed0bea99' is unchanged
WARNING: Ref 'refs/tags/v.0.0.1' is unchanged
WARNING: Ref 'refs/tags/v0.0.1' is unchanged
WARNING: Ref 'refs/tags/v0.0.1-alpha' is unchanged
WARNING: Ref 'refs/tags/v0.0.10' is unchanged
WARNING: Ref 'refs/tags/v0.0.11-beta' is unchanged
WARNING: Ref 'refs/tags/v0.0.14' is unchanged
WARNING: Ref 'refs/tags/v0.0.3-alpha' is unchanged
WARNING: Ref 'refs/tags/v0.0.4-chaos-poc' is unchanged

As a result, the number of leaks do not seem to be going down.

I am confused as to why this is happening and would appreciate any pointers.

Upvotes: 1

Views: 1004

Answers (3)

VonC
VonC

Reputation: 1324093

Try instead the new git filter-repo, which will replace the old git filter-branch or BFG

git filter-repo --use-base-name --path terra/fixtures.go --invert-paths

By default, this new command works on all branches. Then a git push --all --force, to override the history of the remote repository.

Upvotes: 1

torek
torek

Reputation: 488143

The refs that git filter-branch reports as unchanged did not have a file named terra/fixtures.go anywhere in their histories. Filter-branch informs you that although you asked it to update these branch names to point to any copied commits, no commits were actually copied in the process.

It might be interesting to find a list of reachable commit hash IDs that do have such a file, and then run git branch --contains on such hash IDs. See below.

Which commits contain file F?

Note that this is a different answer to a different question. It's also not looking for commits in which some path name was modified, but rather for commits in which some path name exists at all.

We start by using git rev-list to list all commits:

git rev-list --all |

The output from git rev-list is simply a list of every commit hash ID that is reachable from the named revision(s). In this case, --all names all branches and tags, along with other refs such as refs/stash, but not any reflog entries.

Then, for each commit listed, we want to test whether this commit contains the named file(s). At this point you generally want a lot of programmability. For instance, suppose the file name is a/b/c.txt. Do you want to also find A/B/C.TXT? If you're on Windows or MacOS, you might. If you're on Linux, probably not. Or, maybe you want to find any file whose name starts or ends with some pattern.

What we'll do here is use git ls-tree -r, which lists out all the file names, and then run them through a search-and-status command such as grep. Note that grep searches for regular expressions, not glob patterns, so a*b means zero or more a characters followed by a b character and will match the strings "abc.txt", "b", "flobby", and so on: these all have zero or more as followed by a b. We'll let the actual matched names show through, so that a human can apply further filtering if needed:

git rev-list --all |
    while read hash; do
        git ls-tree -r $hash > /tmp/files
        if grep -s 'terra/fixtures\.go' /tmp/files; then
            echo "commit ${hash} :"
            grep 'terra/fixtures\.go' /tmp/files
        fi
    done
rm /tmp/files

The output of this set of commands—which you probably should put in a file, and which I have not tested and might contain errors—is a list of commit hash IDs suitable for extraction but also followed by the matched names: you should probably discard matches for, e.g., sputerra/fixtures.gobble.

(It's possible to write fancier grep patterns that match more exactly. In this case, anchoring the regular expression with ^ and $ would suffice. In more complicated cases, more complicated regular expressions are required. I leave this to whoever is using the code.)

Having obtained hash IDs—run the above and redirect to a file, clean up the file, and then extract the more interesting hash IDs—you can then do:

git branch --contains <hash>

on any given commit hash to see which branches contain that particular commit. Note that there may be zero or more branches containing any given commit. For (much) more about that, read and understand Think Like (a) Git.

Upvotes: 1

tmaj
tmaj

Reputation: 34987

Try with double quotes

git filter-branch --force --index-filter \
  "git rm -r --cached --ignore-unmatch 'terra/fixtures.go'" \
  --prune-empty --tag-name-filter cat -- --all

Upvotes: 1

Related Questions