Aviv Cohn
Aviv Cohn

Reputation: 17193

Why does git rebase trigger a merge conflict?

I have cloned a remote repository, create a new branch b and started working and making commits. I have also pushed the b branch, but am the only one who's working on it.

After a while, I want to rebase my local branch over the remote master, just to synchronize with changes that may have happened to the system in general. Note that I am certain I'm the only one who's working on these particular files.

So I did

git fetch --all
git rebase origin/master

And then Git notifies me of a merge conflict.

Now, I can easily solve the conflict by hand, but I am bothered by this: why does a merge conflict occur?

If I'm not mistaken, the whole idea of git rebase is to "replay" all the commits of my current branch over the tip of the specified branch. I'm the only one who has worked on these particular files, or on this particular branch.

So why does this happen? Is something wrong with my approach?

Upvotes: 9

Views: 2145

Answers (2)

torek
torek

Reputation: 488183

It helps, I think, to realize that git rebase is really an automated way to run git cherry-pick repeatedly. But this only helps if you also realize that git cherry-pick is a form of merge. That's where the merge conflicts come from.

It's easier to understand this when looking at a regular merge. Let's draw a bit of a commit graph, with single uppercase letters standing in for each commit, like this:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

If we run git merge branch2, Git finds three commits:

  • One commit—which is #2 in the end—is always the current, or HEAD, commit. Since HEAD is attached to the name branch1, the current commit is the one that branch1 identifies: commit J.

  • The #3 commit in the end is the commit you name. By using the name branch2 you tell Git to read that name and see that it points to commit L.

  • The #1 commit is one that Git finds on its own. Git does this by finding the best commit that's "on" both branches. The commits that are on branch1 are ...-G-H-I-J. The commits that are on branch2 are ...-G-H-K-L. So commit G is on both branches, but it's further back than commit H, which is also on both branches. Commits I-J are only on one branch, and K-L are only on the other. That means commit H is the best shared commit.

Git can now perform a merge. To do so, Git runs two git diff commands, in effect:

  • git diff --find-renames hash-of-H hash-of-J: this tells Git what changed between the common starting point and your commit, i.e., what you did.

  • git diff --find-renames hash-of-H hash-of-L: this tells Git what changed between the same starting point and their commit, i.e., what they did.

The merge command's job is now to combine your changes and their changes:

  • For files that are in H that nobody touched, keep those files.
  • For files that are in H that you changed and they didn't, take your version.
  • For files that are in H that they changed and you didn't, take their version.
  • For files that you both touched, figure out if it's possible to combine your changes.

There are some other tricky cases, such as what happens if you renamed a file and/or they renamed a file, or you deleted a file and they modified it, and so on. But mostly, merge conflicts happen when you both made changes to some file and you both changed the same lines of that file, or made changes that otherwise "touch". If your changes and their changes don't "touch each other", Git will assume that it is OK to keep both changes. Otherwise, you get a merge conflict.

This is all a bit tricky the first few times you go through it, but after a while it feels pretty natural. If Alice changed "the red ball" to "the blue ball" and Bob changed "the red ball" to "the red brick", Git doesn't know what to do, and makes you pick the right answer.

Enter cherry-pick

The git cherry-pick command has the job of copying a commit. That is, given some commit, which represents a full snapshot of all files, we want to figure out what changed in that file.

It's easy enough, in Git, to turn two adjacent commits—two snapshots that happened one right after the other—into a set of changes. All we have to do is ask Git to run git diff on the two snapshots. Git will figure out which files are the same, and say nothing at all about those. It will figure out which files are different, and produce a recipe—a set of lines to add and/or delete—that will change the earlier commit's file into the later commit's copy. If we ask it, with --find-renames (on by default since Git 2.9), Git will figure out if a file that's gone missing on the left, compared to a file that's new on the right, represents a file rename operation, too.

Imagine, then, that we have this:

...--G--H--I--J   <-- main
         \
          K--L   <-- feature

If we ask for a diff from H to K, we'll see what changed in K, compared to H. That might, for instance, say something like "add this line after line 72 of file.py".

But what if we want to apply these changes to commit J? We could just close our eyes and hope that "add this line after line 72" makes sense, but what if what was line 72 is now line 75, or perhaps even further away? We could search for context. But perhaps we can do better even than that.

Instead of just applying this change blindly or checking context, what if we first grab a second diff, of commit H vs commit J? That will tell us what they changed. If they added three lines above line 72, well, now what was line 72 is definitely line 75. So that tells us where to put our change.

But hang on a moment, this "take two diffs and combine them" idea is what git merge does! And in fact, that's exactly how git cherry-pick works: we pick the parent of the commit we're copying, and pretend it's the merge base. We get two diffs, one from the merge base to the commit we're copying—these are "their" changes—and one from the merge base to the commit we're working on right now, which is commit J, and these are "our" changes. We have Git combine them, using the same code it uses when we run git merge.

If all goes well, git cherry-pick makes a new commit for us. The git rebase command does all this in what Git calls detached HEAD mode, so that the picture now looks like this:

                K'  <-- HEAD
               /
...--G--H--I--J   <-- main
         \
          K--L   <-- feature

We'll call the new commit K' to indicate that it is a copy of original commit K. Now it's time to cherry-pick commit L, so now Git will diff K vs L to see what "they" (really, we) changed, and also diff K vs K' to see what "we" (really, everything up through and including the previous cherry-pick operation) changed. Then Git will try to combine these two sets of changes—"ours", from K-vs-K', and "theirs", from K-vs-L—and if all goes well, git cherry-pick will make a new commit L':

                K'-L'  <-- HEAD
               /
...--G--H--I--J   <-- main
         \
          K--L   <-- feature

If things don't go well, during either git cherry-pick step, Git will stop and make us resolve the conflict, in exactly the same way it does with git merge.

Once all the commits are copied, git rebase has one final trick: it yanks the name feature off the old position and sticks it wherever HEAD points now, then "re-attaches" HEAD to the branch name. In this case, that produces:

                K'-L'  <-- feature (HEAD)
               /
...--G--H--I--J   <-- main
         \
          K--L   [abandoned]

If you now look at the commits with git log, you won't see the original K-L commits at all, and will instead see the new K'-L' commits. The next commit before L is now J and the feature branch has been rebased onto the main branch.

Any merge conflicts occur because "you" and "they" touched the same, or adjacent, lines of the same file in the diffs produced during the merge-with-odd-merge-base process. Of course "their" commits are actually your commits—the ones you are rebasing—and "your" commits are often someone else's initially, as you begin the rebase. Eventually "your" commits are a mix of yours and theirs, and it's pretty darned confusing.

(I like to set merge.conflictStyle to diff3 to get more information when I hit these merge conflicts.)

Upvotes: 6

matt
matt

Reputation: 535139

The problem is the definition of "replay". A rebase does just what a merge does: it creates a diff (in this case, from the point where your branch b separated from master to the tip of your branch b) and attempts to apply that to the end of origin/master.

So, call the branch-off commit split.

Now, we know that origin/master is not split — because if it were, you would need to rebase onto master in the first place. Therefore, there are some commits that have been added to master since split.

Well, there can be conflicts exactly as in a merge. The diff by which one gets from split to master and the diff by which one gets from split to b can contain things that cannot be done at the same time automatically — for example, the same region of the same file was edited in two different ways, or a file was edited in one path but deleted in the other, and so on. That's a conflict.

Note that a conflict does not mean that anything bad happened! The word "conflict" is very poorly chosen. It means merely that git doesn't want to prejudice things by trying to read your mind; it turns to you complete the merge manually because if it chose automatically what to do, it might do something you don't want.

Upvotes: 0

Related Questions