Reputation: 383
I have found out how to remove a file from git's history, but when I tried to remove a file (root Makefile
) that had been replaced by another (Makefile
in subdirectory moved to root), the method I used removed the replacement as well.
Here is the exact command I used:
git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch Makefile'
Is there any way to do this without removing the replacement? If so, how?
Note: I am asking about a way to retain the history of the replacement file. I do have the option of removing both and restoring the current one from backup, which would lose history. If there is no other option, I will do that because I want the original file gone.
Upvotes: 0
Views: 188
Reputation: 489065
Note: I am asking about a way to retain the history of the replacement file.
You need to arrange to remove only the original file. How to do that is a bit tricky.
The key to understanding the problem here is simple: Git does not have file history. There is no such thing! (See also Missing deletion of lines in file history (git))
The history that Git does have is commit history. More precisely, the commits are the history: each commit has a backwards link, to its parent commit. (For merge commits, the backwards link goes to at least two parents.) Each commit contains a full snapshot of all files, so by having Git walk through commit history, and asking questions about what files are in each commit, you can have Git synthesize a pretend file history. But it's not actually there, it's computed from what is there.
All commits in a Git repository are completely read-only. This includes their backwards links: the link holds the "true name" (hash ID) of the parent commit(s), and the true name of a commit depends on its content. That makes it impossible to change history. Git's filter-branch
does not even try to do that. What it does instead is simple: it copies every commit (well, every commit you tell it to copy), while applying whatever filter(s) you specified. For each existing commit, Git:
If the new commit is 100%, bit-for-bit identical to the original commit, you get back the original commit with its original hash ID. However, as soon as there is any change at all, you get back a different commit, with a different hash ID. The main cleverness inside filter-branch is that it defines a mapping from original hash ID (original commit) to new hash ID (copied commit), and as it copies, it always replaces the parent hash IDs with their mapped versions.
What this means is that you can take a nice, simple graph like:
A <-B <-C <--master
(where each uppercase letter stands in for the actual commit hash ID, and the arrows are the stored hash IDs in each commit or in the name master
) and filter it. If you change anything about commit A
, you get a new, different commit A'
, and the copy of B
will point back to A'
rather than to A
. The copy of C'
will point back to B'
. This is true even if you also change something while copying B
and C
. The result is:
A <-B <-C <--master
A' <-B' <-C'
The last thing filter-branch does is rip the names off the original chains of commits and make them point to the new chains:
A <-B <-C [refs/original/refs/heads/master]
A' <-B' <-C' <-- master
Running git log
, or anything thing displays the commits—the history—now starts at C'
and works backwards. The history shown, or synthesized, is from the new, copied commits.
In your original series of commits, you have some commits that contain a file named Makefile
that you don't want them to contain. Then you have a series of other commits that contain a file named Makefile
that you do want them to contain. Your job, in your filter, is to distinguish between these two sets of commits. Instead of:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch Makefile'
you want, e.g.:
git filter-branch --index-filter magic-script
(plus any other options you like1). The hard part is deciding what goes into this magic-script
, because what you want is: "if commit being copied has wrong makefile, remove it, if not, don't". But how will you test that?
There are multiple answers, including using --tree-filter
instead of --index-filter
: a tree filter, which is much (much) slower, literally extracts each commit, so that you can inspect the files in it, and builds the new commit from the extracted files.
The index filter leaves the extracted commit in the index (this is a special filter-branch temporary index but you generally don't need to care about that). This is why you were using git rm --cached --ignore-unmatch Makefile
: that removes the file named Makefile
from the index, after which filter-branch builds the new commit from the index. Index operations, which take place in a special Git-only file, are much faster than general file system operations. But they don't let you inspect the file named Makefile
, in order to make a decision about it.
There is another way to deal with this, though. Suppose, in our idealized three-commit repository A-B-C
above, you fixed the Makefile at commit C
, then added several more commits D-E-F-G
or whatever. What you want to do in this case is use a test of the form:
B
or any of its ancestors such as A
, remove the file named Makefile
.C
or later), leave the file named Makefile
.This, as it turns out, is possible to do. The plumbing command git merge-base --is-ancestor
performs this kind of ancestry test, and is usable in a shell script if
test:
if git merge-base --is-ancestor $GIT_COMMIT <hash>; then
git rm --cached --ignore-unmatch Makefile;
fi
(the "is ancestor" test includes equality, so the <hash>
here would be the literal hash ID of commit B
). Put this whole thing inside single quotes, with the appropriate hash ID in place, and you have the filter you probably want.
(Where it can go wrong is if there are multiple cases where Makefile
should or should not be removed. If you have enough time, and/or a RAM-based file system with enough space, you can use --tree-filter
and examine the actual Makefile
. Or, you can get very fancy and use plumbing commands to examine the Git object whose hash ID is stored in the index, and use --index-filter
, but that's a little bit tricky.)
1You might still want -f
here, and also things like --tag-name-filter cat
and -- --all
. Note that -f
exists to tell filter-branch that if a previous filter-branch left behind a refs/original/
name-space, it's OK to destroy that. It's always wise to run these operations on a copy of the repository (a clone: perhaps one made with git clone --mirror
) in case you goof things up, in which case the refs/original/
stuff is unnecessary caution: you've already used all necessary caution!
Upvotes: 1
Reputation: 45719
To do this, first you need to identify the last commit(s) that contain the old file (which you want removed), and the earliest commit(s) that contain the new file you want to retain. You can then apply your filter to only the older commits.
For example if you have
o -- o -- o -- o -- A -- x -- x -- x <--(master)
where the old file is present in commits marked o
, but commit A
moved the newer file (which you want to keep) to the root: then you want to leave A
and the x
commits untouched.
To do that with filter-branch
either you need a filter that "knows" what commit it's editing, or you need to apply the filter only to the o
commits. The latter is easier, but in that case you'd get a split history
o -- o -- o -- o -- A -- x -- x -- x <--(master)
o' -- o' -- o' -- o'
and you'd have to follow up by "re-parenting" A
to the last o'
commit. That can also be done with filter-branch
(using --parent-filter
), but this still only deals with one line of history - or, at least, only one "transition" commit where you switched files. If you have multiple commits that "introduce" the change between files (i.e. because the change propagated across branches via merges), then this procedure will quickly get more and more complicated.
A better solution would be to consider the BFG Repo Cleaner. It's specialized to the purpose of removing unwanted history, so (1) it's faster, and (2) it's often easier. It can be configured to "protect" some commits and only edit others. Please see the project's page and docs for more details (https://rtyley.github.io/bfg-repo-cleaner/)
Upvotes: 1