Holt
Holt

Reputation: 37686

Remove commit on now deleted files

I have clean my repository using git filter-branch to remove some folders using the following command:

git filter-branch --force --index-filter 'git rm -r --cached --ignore-unmatch folder' \ 
    --prune-empty --tag-name-filter cat -- --all

After this, I still have some commits that are only relevant for now-deleted files. Is there a way to clean this, i.e., remove commits on files that are not in the history anymore?

Upvotes: 3

Views: 398

Answers (2)

xJom
xJom

Reputation: 31

I found the answer in step 6 on this page: https://github.com/epics-modules/motor/wiki/Creating-a-standalone-driver-module

git filter-branch --prune-empty --parent-filter 'sed "s/-p //g" | xargs -r git show-branch --independent | sed "s/\</-p /g"'

Upvotes: -1

torek
torek

Reputation: 489918

TL;DR

If there really are some, you can just run an otherwise-no-op second filter-branch with --prune-empty to discard them.

Long, with experiment

After this, I still have some commits that are only relevant for now-deleted files.

That should be rare, because you included:

--prune-empty

which directs filter-branch to omit any simple (non-merge) commit that matches its parent.

It's possible you have some merges that merge two branches that are otherwise identical at their input points, but these commits cannot be removed because they merge the branches that are otherwise identical.

Suppose, for instance, we have the following branch structure before the filtering:

          B--C
         /    \
...--o--A      M--o--...
         \    /
          D--E

Commit M is a merge and must be preserved (so it is), but suppose A-vs-B collapses away because after filtering, B matches A, and C-vs-B (or vs-A) also collapses away for the same reason, giving:

...--o--A'-----M'-o--...
         \    /
          D'-E'

(the tick marks or prime symbols after the letters indicate that these are copies of the original commits—it's possible A' is A, if nothing earlier changed, but this covers the more general case). Here M really is still required to maintain the logical structure, even though it would have been possible to do a fast-forward, had the commits been made without the removed files originally.

The more interesting case is when D and E themselves collapse away, because now M still exists as an input to the filter-branch process, and still has two parents, but both parents map to commit A' itself. I'm not sure, without looking, what happens here: can filter-branch make a merge commit with A' listed twice as the two parents for M'? If it attempts this, does git commit-tree write commit M' as an ordinary, non-merge commit? Testing says no:

$ mkdir mtest
$ cd mtest
$ git init
Initialized empty Git repository in ...
$ echo test commit-tree > README
$ git add README
$ git commit -m initial
[master (root-commit) 1db1f76] initial
 1 file changed, 1 insertion(+)
 create mode 100644 README
$ echo log msg | git commit-tree -p HEAD -p HEAD HEAD^{tree}
error: duplicate parent 1db1f76a4e7217d5198c0f178464b7a087e94078 ignored
44f91061b7bd08c39a4dc9e8ebb1f4f7c588ea9e

So it appears that a naive filter-branch will attempt to make M' and will make it as a single-parent commit:

...--o--A'-M'-o--...

where M' and A' have no differences. The filter-branch code has a check for this though:

    for parent in $parents; do
            for reparent in $(map "$parent"); do
                    case "$parentstr " in
                    *" -p $reparent "*)
                            ;;
                    *)
                            parentstr="$parentstr -p $reparent"
                            ;;
                    esac
            done
    done

(If you provide your own parent filter, this happens after the checking, and you will need your own re-checking.)

So, if you are getting this kind of ...--A'-M'-... result, a second filter-branch with just --prune-empty will toss them out.

Upvotes: 2

Related Questions