Reputation: 49714
I rebase often. Occasionally the rebase
is particularly problematic (lots of merge conflicts) and my solution in such cases is to cherry-pick
the individual commits onto master branch. I do this because nearly every time I do, the number of conflicts is considerably less.
My question is why this would be the case.
Why are there fewer merge conflicts when I cherry-pick
than when I rebase
?
In my mental model a rebase
and a cherry-pick
are doing the same thing.
Rebase example
A-B-C (master)
\
D-E (next)
git checkout next
git rebase master
produces
A-B-C (master)
\
D`-E` (next)
and then
git checkout master
git merge next
produces
A-B-C-D`-E` (master)
Cherry pick example
A-B-C (master)
\
D-E (next)
git checkout master
git cherry-pick D E
produces
A-B-C-D`-E` (master)
From my understanding the end result is the same. (D and E are now on master with a clean (straight-line) commit history.)
Why would the latter (cherry picking) ever produce fewer merge conflicts than the former (rebasing)?
UPDATE UPDATE UPDATE
I was finally able to reproduce this problem and I realize now that I may have oversimplified the example above. Here's how I was able to reproduce...
Say I have the following (notice the extra branch)
A-B-C (master)
\
D-E (next)
\
F-G (other-next)
And then I do the following
git checkout next
git rebase master
git checkout master
git merge next
I end up with the following
A-B-C-D`-E` (master)
\ \
\ D`-E` (next)
\
D-E
\
F-G (other-next)
From here, I'll either rebase or cherry-pick
Rebasing example
git checkout other-next
git rebase master
produces
A-B-C-D`-E`-F`-G` (master)
Cherry picking example
git checkout master
git cherry-pick F G
produces the same result
A-B-C-D`-E`-F`-G` (master)
but with far fewer merge conflicts than the rebasing strategy.
Having finally reproduced a similar example I think I see why there were more merge conflicts with the rebasing than with the cherry picking, but I'll leave it for someone else (who will likely do a better (and more accurate) job than I would) to answer.
Upvotes: 13
Views: 2463
Reputation: 487883
I think what's happening here has to do with choosing the commits to copy.
Let's note, and then put aside, the fact that git rebase
may use either git cherry-pick
, or git format-patch
and git am
, to copy some commits. In most cases git cherry-pick
and git am
should achieve the same results. (The git rebase
documentation specifically calls out upstream file renames as an issue for the cherry-pick method, vs the default git am
-based method for non-interactive rebase. See also various parenthetical remarks in original answer below, and comments.)
The main thing to consider here is which commits are to be copied. In the manual method, you first manually copy commits D
and E
to D'
and E'
, then you manually copy F
and G
to F'
and G'
. This is the minimal amount of work to do and is just what we want; the only drawback here is all the manual commit-identifying we have to do.
When you use the command:
git checkout <branch> && git rebase <upstream>
you make Git automate the process of finding commits to copy. This is great when Git gets it right, but not if Git gets it wrong.
So how does Git choose these commits? The simple, but somewhat wrong, answer is in this sentence (from the same documentation):
All changes made by commits in the current branch but that are not in <upstream> are saved to a temporary area. This is the same set of commits that would be shown by
git log <upstream>..HEAD
; or bygit log 'fork_point'..HEAD
, if--fork-point
is active (see the description on--fork-point
below); or bygit log HEAD
, if the--root
option is specified.
The --fork-point
complication is somewhat new, since git 2.something, but it's not "active" in this case because you specified an <upstream>
argument and did not specify --fork-point
. The actual <upstream>
is master
both times.
Now, if you actually run each git log
(with --oneline
to make it nicer):
git checkout next && git log --oneline master..HEAD
and:
git checkout other-next && git log --oneline master..HEAD
you will see that the first one lists commits D
and E
—excellent!—but the second one lists D
, E
, F
, and G
. Uh oh, D
and E
occur twice!
The thing is, this sometimes works. Well, I said "somewhat wrong" above. Here's what makes it wrong, just two paragraphs down from the earlier quote:
Note that any commits in HEAD which introduce the same textual changes as a commit in HEAD..<upstream> are omitted (i.e., a patch already accepted upstream with a different commit message or timestamp will be skipped).
Note that HEAD..<upstream>
here is the reverse of the <upstream>..HEAD
in the git log
commands we just ran, where we saw D
-through-G
.
For the first rebase, there are no commits in git log HEAD..master
, so there are no commits that could possibly get skipped. That's good, because there are no commits to skip: we're copying E
and F
to E'
and F'
, and that's just what we want.
For the second rebase, though, which happens after the first rebase is done, git log HEAD..master
will show you commits E'
and F'
: the two copies we just made. These are potentially skipped: they are candidates to consider skipping.
So how does Git decide which commits that it should really skip? The answer is in git patch-id
, although it's actually implemented directly in git rev-list
, which is a very fancy and complicated command. Neither of these really describes it terribly well, though, in part because it is hard to describe. Here's my attempt anyway. :-)
What Git does here is look at the diffs, after stripping off identifying line numbers, in case the patches go in slightly different locations (due to earlier patches moving lines up and down in files). It uses the same tricks it uses with files—turning file contents into unique hashes—to turn each commit into a "patch ID". The commit ID is a unique hash that identifies one specific commit, and always that same one specific commit. The patch ID is a different (but still unique-to-some-content) hash ID that always identifies "the same" patch, i.e., something that removes and adds the same diff-hunks, even if it removes and adds them from different locations.
Having computed a patch ID for every commit, Git can then say: "Aha, commit D
and commit D'
have the same patch-ID! I should skip copying D
because D'
is probably a result of copying D
." It can do the same for E
vs E'
. This often works—but it fails for D
whenever the copy from D
to D'
required manual intervention (fixing merge conflicts), and it likewise fails for E
whenever the copy from E
to E'
required manual intervention.
What's needed here is a sort of "smart rebase" that can look at a series of branches and compute, in advance, which commits to copy, once, for all the to-be-rebased branches. Then, after all the copies are done, this "smart rebase" would adjust all the branch-names.
In this particular case—copying D
through G
—it's actually pretty easy, and you can do this manually with:
$ git checkout -q other-next && git rebase master
[here rebase copies D, E, F, and G, perhaps with your assistance]
followed by:
$ git checkout next
[here git checks out "next", so that HEAD is ref: refs/heads/next
and refs/heads/next points to original commit E]
$ git reset --hard other-next~2
This works because other-next
names commit G'
, whose parent is F'
, whose parent in turn is E'
, and this is where we want next
to point. Since HEAD
refers to branch next
, git reset
adjusts refs/heads/next
to point to commit E'
, and we're done.
In more complex cases, the commits that need to be copied-exactly-once are not all neatly linear:
A1-A2-A3 <-- featureA
/
...--o--o--o--o--o--o--o <-- master
\
*--*--B3-B4-B5 <-- featureB
\
C3-C4 <-- featureC
If we want to "multi-rebase" all three features, we can rebase featureA
independently of the other two—none of the three A
commits depend on anything "non-master" other than earlier A
commits—but to copy the five B
commits and the four C
commits, we must copy the two *
commits that are both B
and C
, but copy them just once, and then copy the remaining three and two commits (respectively) onto the tip of the copied commit.
(It would be possible to write such a "smart rebase", but integrating that into Git properly, so that git status
truly understands it, is considerably harder.)
I'd love to see a reproducible example. In most cases your "in-head" model should work. There is one known special case though.
An interactive rebase, or adding -m
or --merge
to plain git rebase
, actually does use git cherry-pick
, while the default non-interactive rebase uses git format-patch
and git am
instead. The latter is not as good for rename detection. In particular, if there is a file rename in the upstream,1 the interactive or --merge
rebase can be expected to behave differently (usually, better).
(Also, note that both kinds of rebase—both the patch-oriented one and the cherry-pick based version—will skip commits that are git patch-id
-identical to commits already in the upstream, via git rev-list --left-only --cherry-pick HEAD...<upstream>
or equivalent. See the documentation for git rev-list
, particularly the section on --cherry-mark
and --left-right
, which I think makes this more comprehensible. This should be the same for both kinds of rebase, though; if you are manually cherry-picking, it will be up to you whether you do this.)
1More precisely, git diff --find-renames
needs to believe there is a rename there. Usually it believes this if there is one, but since it's detecting them by comparing trees, this is not perfect.
Upvotes: 12