xnervwang
xnervwang

Reputation: 155

Why does Git know it can cherry-pick a reverted commit?

In a branch, say, there are 3 commits: A <- B <- C. If I cherry-pick B directly (Test A), Git says:

The previous cherry-pick is now empty, possibly due to conflict resolution.
If you wish to commit it anyway, use:

    git commit --allow-empty

I can understand that because B is already in this branch, it's no-op to cherry-pick it again.

Then I reverted B and C in a batch commit by:

git revert -n B^..C
git commit -a -m "xxx"

This would be a new big commit D which reverts B and C, the branch should be like A <- B <- C <- D.

Then I need to redo B and C due to some reason. I tried:

git cherry-pick B^..C

I see two new commits B' and C' are appended to the branch: A <- B <- C <- D <- B' <- C'.

My first question is, How can Git intelligently knows it should create B' and C'? I thought Git would find B and C are already in branch history, so it may just skip them like when I cherry-pick 'B' directly in Test A.

Then, after that, since the branch is already A <- B <- C <- D <- B' <- C', I run this command again:

git cherry-pick B^..C

I expected Git can recognize this is a no-op operation. But this time Git complains confliction. My second question is, why does Git fail to recognize and skip this operation this time?

Upvotes: 2

Views: 1373

Answers (3)

matt
matt

Reputation: 534977

Let's step back about ten feet here and get a bigger mental picture of what Git is.

A Git commit is a snapshot of all the files. It represents your whole project, basically. It is not about diffs. This is a brilliant architecture because it is extremely fast and effectively infallible. Any commit can absolutely restore that state of your project, kaboom, just by checking it out; there is no need to "think".

However, Git can make diffs between two commits, and that is how it implements what we may call "merge logic". Every merge consists of applying two diffs simultaneously. [Well, it might be more than two, but pretend it isn't.] A merge, a cherry pick, a rebase, a revert are all merges in that sense — they all use "merge logic" to form a commit expressing the result of applying two diffs. The trick is to know who the comparands are in the construction of the two diffs.

  • When you ask for a true git merge, say of two branches, Git figures out where those branches last diverged. This is called the merge base. The comparands are: the merge base and the tip of branch1, and the merge base and tip of branch2. Both of those two diffs are applied to the merge base and the result is used to form a commit with two parents (the branch tips). The first branch name then slides up one, to point to that new commit.

  • When you ask for a cherry-pick, the merge base is the parent of the commit being picked. The comparands are: the merge base and the head, and the merge base and the picked commit. Both of those two diffs are applied to the merge base and the result is used to form a commit with one parent (the head). The head branch name then slides up one, to point to that new commit. [And a rebase is just a series of cherry picks!]

  • A revert also uses merge logic. As jthill has explained, it's just a matter of forming one of the diffs backwards. The merge base is the commit that you are trying to reverse. The comparands are: the merge base and its parent (in that direction), and the merge base and the head. These diffs are applied to the merge base and used to form a commit whose parent is the head. The head branch name then slides up one, to point to that new commit. If this suggests to you that a revert is basically a backwards cherry-pick, you are absolutely right.


The cool thing is that once you know this, you can predict what will happen when you give one of these commands, because you can extract those same diffs yourself by saying git diff. Git's merge logic essentially lies open to your gaze. It remains then only to understand the circumstances under which Git stops in the middle of the operation because it cannot proceed without further explicit instructions. That's called (unfortunately) a conflict, and there are two main ways it can arise:

  • The same line in the same file was changed in two different ways in the two diffs. Git's idea of what constitutes the same line is rather broader than you might expect; this surprises beginners.

  • The same file, qua file, was treated in two incompatible ways: for example, one diff deletes it but the other diff keeps and edits it.


I should add one more fact that explains a lot of behavior, including part of what you're asking about. This may seem obvious, but it is worth stating explicitly: in a diff, "nothing" is not a thing. What I mean is this. Supposing one diff changes a line and the other diff does nothing to that line. Then the way to enact both diffs is: change the line. Doing nothing is not a thing: it does not "tussle" against the change.

That is worth mentioning especially because beginners often don't grasp it. The other day there was a question where a user was complaining that in a merge where the second branch deleted a file, the file did indeed end up deleted even though the first branch kept it. That user was thinking of "don't delete the file" as a thing, and indeed as a primary thing. But it isn't. The two diffs are weighed equally by default, so one branch did nothing and one branch deleted the file, and doing nothing is not a thing, so the result is to delete the file.

Upvotes: 1

jthill
jthill

Reputation: 60255

cherry-pick is a merge, of the diffs from your cherry-pick's parent to the cherry-pick, with the diffs from your cherry-pick's parent to your checked-out tip. That's it. Git doesn't have to know any more than that. It doesn't care "where" any of the commits are, it cares about merging those two sets of diffs.

revert is a merge of the diffs from your revert to its parent with the diffs from your revert to your checked-out tip. That's it. Git doesn't have to know any more.

Here: try this:

git init test; cd $_
printf %s\\n 1 2 3 4 5 >file; git add .; git commit -m1
sed -si 2s,$,x, file; git commit -am2
sed -si 4s,$,x, file; git commit -am3

Run git diff :/1 :/2 and git diff :/1 :/3. Those are the diffs git runs when you say git cherry-pick :/2 here. The first diff changes line 2, and the second commit changes lines 2 and 4; the line 4 change does not abut any changes in the first diff and the line 2 change is identical in both. There's nothing left to do, all the :/1-:/2 changes are also in :/1-:/3.

Now before you start on what follows, let me say this: this is harder to explain in prose than it is to just see. Do the example sequence above and look at the output. It is much, much easier to see what's going on by looking at it than by reading any description of it. Everybody goes through a stretch where this is too new and maybe a little orientation will help, and that's what the paragraphs below are for, but again: the prose, alone, is harder to understand than the diffs. Run the diffs, try to understand what you're looking at, if you need a little help over what I promise is a very small hump follow along in the text below. When it snaps into focus see if you don't at least mentally slap your forehead and think "wow why was that so hard to see?", just like, well, just about everybody.

Git's merge rules are pretty straightforward: identical changes to overlapping or abutting lines are accepted as-is. Changes to lines with no changes in one diff for changed lines, or lines abutting changed lines, in the other, are accepted as is. Different changes to any overlapping or abutting lines, well, there's an awful lot of history to look at and nobody's ever found a rule that will predict what the results of that should be every time, so git declares the changes conflict, dumps both sets of results into the file and lets you decide what the result should be.

So what happens if you now change line 3?

sed -si 3s,$,x, file; git commit -amx

run git diff :/1 :/2 and git diff :/1 :/x, and you'll see that where, relative to the cherry-pick's parent, :/2 changed line 2 and your tip changed lines 2,3 and 4. 2 and 3 abut, that's historically too close for automated genies to handle properly, so yay, you get to do it: git cherry-pick :/2 now will declare a conflict, showing you the change to line 2 and the two different versions of lines 3 and 4 (:/2 changed neither, your tip changed both, in context here it's clear the line 3 and 4 changes are fine as-is but again: nobody's ever figured out an automatic rule for reliably identifying such contexts).

You can ring changes on this setup to test out how reverts work. Also stash pops, and merges, and git checkout -m which runs a quick ad-hoc merge with your index.

Your git cherry-pick B^..C is a cherry-pick of two commits, B and C. It does them one after another, exactly as described above. Since you've reverted B and C, and then cherry-picked them again, this has the exact same effect as applying B and C and then cherry-picking B (with the intent of then cherry-picking C). I conclude that B and C touch overlapping or abutting lines, so git diff B^ B will show changes that overlap or abut changes in git diff B^ C', and that's what Git's not going to just pick for you, because whatever looks right here, in other circumstances nobody can write a rule for identifying, an identical-looking choice will be wrong. So git says the two sets of changes conflict and you get to sort it out.

Upvotes: 5

j6t
j6t

Reputation: 13387

This expands @jthill's answer.

Consider a regular merge in a history like this:

a--b--c--d--e--f--g--h
       \
        r--s--t

Git performs the merge by looking only at the contents of these commits:

c--h   <-- theirs
 \
  t    <-- ours
^
|
base

and nothing else. Note that at the conceptual level it is completely irrelevant which side is denoted "ours" and which is "theirs"; they are totally interchangeable. (The only time it makes a difference is when there are conflicts and Git has to decide how it marks the sides as "theirs" and "ours" for the user.) (I'll omit the labels "base", "theirs" and "ours" in the following charts.)

In your history

A--B--C

the merge operation behind the first git cherry-pick B looked at the following commits:

A--B
 \
  C

Here, A is chosen because it is the parent of B, a.k.a., B^. Obviously, the changes from A to C also contain the changes from A to B and the merge machinery produces a no-change-merge-result, and that produces the cherry-pick is now empty message.

Then you made this history by reverting both B and C:

A--B--C--R

Then the next git cherry-pick B looked at these commits:

A--B
 \
  R

This time, the changes from A to R do no longer contain the changes from A to B because they have been reverted. Therefore, the merge no longer produces an empty result.

A small detour: When you do git revert B in your history, the merge machinery looks at these commits:

B--A
 \
  C

Note that only B and the parent of B, a.k.a, A are swapped around compared to git cherry-pick B.

(I was describing a single-commit reversal as I am unsure how a multi-commit reversal works.)

Upvotes: 2

Related Questions