Hugh
Hugh

Reputation: 758

Git reverts and cherry picks

I'm puzzled by something, here's a summary.

Our repo has a master branch and a d2l_phase_4 branch. Since the d2l_phase_4 branch was created several new commits have been added to master.

Today someone incorrectly merged a pull request into master of a commit that should have been targeting the d2l_phase_4 branch.

Therefore I did this:

  1. Reverted the commit and pushed to fork/master
  2. Made a pull request to the main repo/master.
  3. Merged that pull request.

At this point we've safely undone the effect of the commit on our master branch.

Then I did this:

  1. Checked out d2l_phase_4 branch
  2. Cherry picked the original commit from master.
  3. pushed to fork/d2l_phase_4
  4. Made a pull request targeting the main repo/d2l_phase_4
  5. Merged this pull request

At this point the d2l_phase_4 branch has the desired changes from the original commit, so far so good.

Finally I wanted to bring the d2l_phase_4 branch up to date with all the other stuff on master since the d2l_phase_4 branch was created. So again on the d2l_phase_4 branch I merged in from master and that created a new merge commit.

I then push this updated d2l_phase_4 branch to my fork/d2l_phase_4 and finally created a pull request targeting the main repo/d2l_phase_4 and then merged that PR.

However - it dawned on me that Id now merged in the earlier revert from the master branch and so must have nullified the commit that I'd earlier cherry picked!

However when examining the state of files on the d2l_phase_4 there's no trace of the revert that was merged in from master - this is exactly what I want - but I do not understand.

Why was my cherry picked commit on d2l_phase_4 not nullified by the revert that was merged in from master? I'd expected that I'd need to revert my revert on the d2l_phase_4 branch but when I attempted this git seemed to tell me there was nothing to revert...

If I look at the histor one of the files involved, on the d2l_phase_4 branch then I see no trace of the revert or anything, all I can see if the cherry picked commit that I did.

Upvotes: 1

Views: 461

Answers (1)

torek
torek

Reputation: 487993

This is a bit complicated, and—as usual—to understand what is going on, we must draw (part of) the commit graph.

Here's what you apparently (based on your text) had before the incorrect merge. I may have the wrong number of commits but the overall diagram must be reasonably close:

...--A--B--C--D--E   <-- master
         \
          F--G--H    <-- dl2_phase_4

For the next part, we must guess, because only you (and others with access to your repo) have the correct information:

Today someone incorrectly merged a pull request into master of a commit that should have been targeting the d2l_phase_4 branch.

"Targeting" doesn't matter; what matters is the commit graph. A pull request simply means that someone has published a repository that has the same commits that you have, plus at least one more commit. The crucial question is where that commit is in their graph, and I don't know the answer to that. But let's just say it comes after D so that it looks like this:

                __I   <-- their-commit
               /
...--A--B--C--D--E    <-- master
         \
          F--G--H     <-- dl2_phase_4

(If it comes after E, we could have a fast-forward instead of a merge, though ultimately this makes no difference. If it comes after F, the diagrams get messy, though again it ultimately makes no difference.)

This is then, as you say, merged into master, giving a new merge commit (I'd call it M but we're going to have more merges, so let's just keep going with single letters and call it J):

                __I
               /   \
...--A--B--C--D--E--J   <-- master
         \
          F--G--H    <-- dl2_phase_4

(we don't need to label I any more, since it's simply the second parent of J—the first parent is E; the first vs second parent distinction tends to get lost in these drawings).

You then, per your three steps, revert the merge, using a long convoluted process (pushing to a fork, pulling back from the fork) that just gives you the same effect as simply backing out commit I in a new commit K on master:

                __I
               /   \
...--A--B--C--D--E--J--K   <-- master
         \
          F--G--H          <-- dl2_phase_4

The critical thing to note about commit K is that its tree—the stored snapshot—is exactly the same as the tree from commit E. Commit J is just a merge of E and I (hence J = E + I)1 and K has "negative I" added, hence K = E + I - I = E.


1This is not guaranteed! If I duplicates a change that is in E, then J is not actually E plus I, it's E plus I minus duplicated part. But if that were the case, we would see the effect later, so I must not duplicate something in E, or perhaps I comes after E. This kind of thing is why actually having the graph, and sometimes the repository itself, is important.


You then went through another long convoluted process that really just amounts to a simple git cherry-pick:

  1. Cherry picked the original commit from master [into dl2_phase_4].

It's not clear which one is the "original" (I or J, though if I was actually fast-forwarded in there is no J) but that also does not matter: we just get a copy, which I will call I':

                __I
               /   \
...--A--B--C--D--E--J--K   <-- master
         \
          F--G--H--I'      <-- dl2_phase_4

While the tree for I' is different from the tree for I, the two relative changes—what we would get if we ran git diff <id-of-D> <id-of-I> and git diff <id-of-H> <id-of-I'> are the same. That's what git cherry-pick does: it diffs some commit against its parent, then applies the resulting diff to the current commit and makes a new commit.

Finally I wanted to bring the d2l_phase_4 branch up to date with all the other stuff on master since the d2l_phase_4 branch was created. So again on the d2l_phase_4 branch I merged in from master and that created a new merge commit.

So let's draw that, now:

                __I
               /   \
...--A--B--C--D--E--J--K     <-- master
         \              \
          F--G--H--I'----L   <-- dl2_phase_4

However - it dawned on me that I'd now merged in the earlier revert from the master branch ... [but] examining the state of files on the d2l_phase_4 there's no trace of the revert that was merged in from master - this is exactly what I want - but I do not understand.

This gets into the first key item:

Git merge does not look at any of the "interior" commits

Why was my cherry picked commit on d2l_phase_4 not nullified by the revert that was merged in from master? [snip]

As you can see from the graph, we're merging commit K while being "on" commit I'. The merge base—the commit commit where the graph lines rejoin—is commit B.

The merge process, merge-as-a-verb, works by, in essence, doing (and then combining) two git diffs: one from merge base to current commit, and one from merge base to the other commit. Hence:

git diff <id-of-B> <id-of-I'>
git diff <id-of-B> <id-of-K>

We already noted that, in terms of tree (what git diff looks at), commit K is the same as commit E. The first diff therefore ignores both the incorrect commit and its reversion. The second diff, comparing B vs I', sees the cherry-picked commit as a change, so it includes that change. Combining the first and second diffs gives you one copy of the change.

The git log command lies, a little bit (or sometimes a lot)

If I look at the history one of the files involved, on the d2l_phase_4 branch then I see no trace of the revert or anything, all I can see if the cherry picked commit that I did.

Now that we have this final merge L, git log should show us the contents of commit L, the contents of L's first and second parents, the contents of their parents, and so on.

Commit L is a merge commit, so it has two parents, K and I'. I' is your cherry picked commit, so you will see it when git log gets to that point.

K is a simple commit with one parent, J; but J is a merge commit with two parents, E and I. You would expect to see commit I here, and—this part gets tricky—sometimes you do.

When you run git log -- <pathspecs>, you are giving Git an implicit argument that (I think) can't be spelled explicitly, although --dense and --sparse may be close enough for explanatory purposes. These turn on history simplification, and history simplification throws out one "side" of a merge if it can. In any case, you also direct Git to ignore all files not named in your <pathspecs> argument(s).

When Git looks at commit L, it is a merge commit with two parents, so this part of the history simplification rule applies:

Commits are included if they are not TREESAME to any parent (though this can be changed, see --sparse below). If the commit was a merge, and it was TREESAME to one parent, follow only that parent. ... Otherwise, follow all parents.

This "TREESAME" is defined as: after throwing out all but the specified paths, is the trimmed-down commit's tree the same as one of its parents' trees? So for L, we compare it to K and I' after stripping out all but the one file you are asking about. If the version of the file in L matches the one in I'—perhaps it does—then git log trims off K at this point, and will no longer see K, nor J, nor either of J's parents, nor E nor D nor C. It will, however, follow I' back to H and then to G and F and B and so on, back down through history.

If the version of the file in L matches the one in K, then git log trims off I'; but since I' definitely touched the file, Git only follows I' here.

To make Git show you the full history, instead of simplifying away side branches found in the graph, you need --full-history. Git will still limit the history to commits that touch the file, but this time it will look at both parents of L.

There are examples of all this in the git log documentation, but it's definitely eye-glazing material.

The other thing to note is that git log often says nothing about the merge commits themselves, unless you instruct it to "split" them (with -m) or force a combined diff (with -c or --cc). That is, it may print the log message, but show no changes, even though git show would show something (the --cc style combined-diff).

Upvotes: 1

Related Questions