watashiSHUN
watashiSHUN

Reputation: 10534

What does git rebase use to determine common ancestor?

I know that I can use git merge-base to determine the common ancestor when performing a git merge, but it looks like this is not true for git rebase

Here is my setup before rebase:
master branch: A--Y--(C)
and dev branch: A-----C--D
(C) is the outcome of me rebasing A--C onto A--Y, same content, but different commit message

git merge-base master dev will return A, and if I do git merge dev, I will see both (C) and C in my history

git rebase master, outcome is: A--Y--(C)--(D) where (D) is D after rebase

Does git rebase consider (C) as the common ancestor? (This feels pretty hard to do in code) I am guessing it still uses A, but when it is cherry picking C,D to append to the end of master, C ended up as a no-op?

Upvotes: 4

Views: 2434

Answers (1)

torek
torek

Reputation: 490178

Let's start with this: what git rebase does is to copy some commits. In your case, it appears to copy commit C but not commit D. (I think you are asking why, but are assuming that it has something to do with the merge base, and that is probably not correct.)

The set of commits that git rebase chooses to copy is largely, but not entirely, determined by the result of:

git rev-list <upstream>..HEAD

where <upstream> is the argument you pass to git rebase, or the configured upstream. For instance, with git rebase master, the <upstream> in this sequence is master. Your current branch is dev, so HEAD refers to dev. So the set of commits to copy is that listed by:

git rev-list master..dev

(although there are additional options added; see below).

If I read your graph right, the input graph is:

...--A--Y   <-- master
      \
       C--D   <-- dev

so that the output of this git rev-list is commits C and D.

Rebase then goes on to:

  1. git checkout the tip of the target branch (commit Y here);
  2. run—sometimes literally, sometimes figuratively— git cherry-pick on commits C and D; and, finally
  3. run checkout -B dev HEAD (not quite literally but the same effect: get back on dev, after moving dev to point to the final copied commit).

What I think you are asking about directly—which I think is not what you should be asking—is which commit is used as a merge base during cherry-picking. The answer here is that the merge base of a cherry-pick is the parent commit of that cherry-pick. So for the git cherry-pick C step, the merge base, if a merge is required, is commit A: C's parent. For the git cherry-pick D step, the merge base, if a merge is required, is commit C: D's parent.

Now, I listed several "additional options" items above. These refer to the commit selection process. In particular, git rebase has two ways to toss commits off of its "to copy" list. (It also generates the list in the reverse of the usual backwards order, so that it does the cherry-picking in the forwards order instead.)

The main way that it eliminates commits from the to-copy list is to use git patch-id. The git patch-id command computes an ID number, which one hopes is "unique enough", based on the patch found by comparing a commit to its immediate parent, throwing away line numbers and some other items that might affect a cherry-picked commit. The result should be the same ID if the commit was already cherry-picked, but different if not.

Hence, after listing C and D, Git goes on to compare their patch-IDs to the patch-ID of commit Y.

There are two commands that do this sort of thing, one meant for users, and the other being git rev-list. The user-facing command is git cherry, but I think here, the computer-oriented git rev-list is actually much easier to explain.

The interesting thing here is how Git knows to compare all the patch IDs of commits C and D to the patch ID of commit Y, when the initial graph is the one we showed above. Or, if the graph were:

...--A--E--F--G   <-- master
      \
       C--D   <-- dev

the two sets of patch-IDs to be compared would be those for (C, D) vs those for (E, F, G). If we had:

          H--I
         /    \
...--A--G---J--K   <-- master
      \
       \   D
        \ / \
         C   F   <-- dev
          \ /
           E

the sets to compare would be (C, D, E, F) vs (G, H, I, J, K). And this, I think, makes the whole concept much clearer: the way this works is that git rev-list can examine both branches down to the point where they meet, and collect a set of commits for each "side".

The way we get git rev-list to do this is:

git rev-list --left-right master...dev

(note the three dots here, and the order of dev and master matters only to determine which commits are "left side"—on master but not on dev—and which are "right side", on dev but not on master).

This three-dot notation, along with this --left-right idea, is how Git figures out which commits to put in which set, in order to compute the complete set of patch-IDs. If the patch ID of a left side commit is the same as the patch ID of a right side commit, those commits, though they are different in some way, represent the same change: they are cherries that have already been cherry-picked.

The git rebase command skips over these pre-picked cherries. It only cherry-picks the remaining commits, whichever those are. The git rebase code runs git rev-list --right-only --cherry-pick --no-merges <upstream>..HEAD to get its list of commits to copy in step 2. Then it runs those three steps. If the patch ID of either C or D matches the patch ID for commit Y, that commit is never copied at all. In this case, it would seem that commit D's patch ID may match commit Y's.

I also said that there are two ways that commits can be eliminated. This cherry detection is one of them, and I suspect it's the reason you see the result I think you see. The other is what Git calls --fork-point, which is a bit complicated. I'm not going to cover it here. It only omits "early" commits: that is, it finds a point along the C--D chain—this would make more sense if the chain were longer—that it thinks it should drop, and only copies commits after that point. Since commit C gets copied, it cannot be the fork-point code causing commit D to get omitted.

Upvotes: 7

Related Questions