user1984
user1984

Reputation: 6788

How to use git filter-branch on pattern of folders

I've committed a bunch of sensitive data to my local repo that has not been published yet.

The sensitive data is scattered across the project in different folders and I want to remove all these completely from git history.

All of the concerning folders have the same name, and are at the same level in the directory in different folders. Following is a sample of my folder structure:

root
    folder1
           ./sensitiveData
    folder2
           ./sensitiveData
    folder3
           ./sensitiveData

using the following command, I am able to delete the folders containing sensitive data one at a time:

git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch javascript/folder1/.sensitiveData' --prune-empty HEAD

But I want to delete all the folders containing sensitive data in one go, because, they are too many, and I would like to learn how this works.

But using the following command, nothing is rewritten and I am warned that 'refs/heads/master' is unchanged is unchanged:

git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch javascript/*/.sensitiveData' --prune-empty HEAD

As I see it, there are two strategies:

  1. Either my pattern is somehow wrong and I need to change it.
  2. Or I should do some looping with bash.

Option one seems more sensible if possible.

Upvotes: 3

Views: 1151

Answers (2)

user1984
user1984

Reputation: 6788

At the end, what solved my problem was a small bash script using the for in construct.

for name in javascript/*/.sensitiveData
    do git filter-branch -f --index-filter "git rm -r --cached --ignore-unmatch $name" --prune-empty HEAD
done

Upvotes: -1

torek
torek

Reputation: 487735

Your command, when you run it, is first evaluated by your shell. So with:

'git rm -r --cached --ignore-unmatch javascript/*/.sensitiveData'

the single quotes protect the entire thing from the shell, and pass it to git filter-branch as the --index-filter to be used later. The single quotes are gone at this point.

Here's the problem: filters given to git filter-branch get evaluated at filtering-time by another shell (technically, the shell that's running git filter-branch itself). This other shell evals the command:

eval $filter

So now this second shell re-interprets:

git rm -r --cached --ignore-unmatch javascript/*/.sensitiveData

It breaks up the arguments at spaces, expands the asterisk based on the current working directory, and invokes git rm -r --cached --ignore-unmatched on the result of the expansion.

If the expansion succeeds, one thing happens; if not, something else happens. Precisely what happens depends on the shell (bash can be configured to behave in several different ways; POSIX sh is more predictable).

The actual current working directory for an --index-filter is generally empty so the expansion will probably fail. This should, in most cases, pass the asterisk on unchanged to Git. Since the argument to git rm is (mostly / essentially) a pathspec, Git will now do its own expansion. This should have worked, so either the path itself is wrong, or the directory is not empty, or there's something odd about your shell so that the failed expansion didn't pass the literal text javascript/*/.sensitiveData to git rm.

You can take some variables out of this equation by using:

'git rm -r --cached --ignore-unmatch javascript/\*/.sensitiveData'

so that the second shell sees:

git rm -r --cached --ignore-unmatch javascript/\*/.sensitiveData

which will force the second shell to pass:

javascript/*/.sensitiveData

directly to git rm. Given that this probably should have worked anyway, though, it's of interest to check whether javascript/*/.sensitiveData would match the right files in the specific commit(s), which you can do kind of clumsily / manually using git ls-tree -r on those commits.

Upvotes: 2

Related Questions