Kolichikov
Kolichikov

Reputation: 3020

Git clobbering new changes with stale changes without raising conflicts

So we're having an issue with our git repository that I can't imagine is very common (or else why would anyone use git?)

We have our main branch (Branch A), where a file has been changed. As part of our release process, we create a release branch (Branch B). Nobody touched the file in Branch B, while changes were made in Branch A. With the release done (bugs and what not), we merge Branch B back into Branch A, and discover that all of the old changes made in Branch A have been lost, and the Branch B version is taken (reverting the file to what was done before the merge)

Graphically (to the right is later time)

A--ch1---*---ch2---ch3--ch4----m---

          \                   /

           B------------------

So now that we are merging at m, git is silently choosing to keep the ch1 changes.

UPDATE

Additionally, we tried merging A into B at point m, and yet it's the same behaviour (git tries to make ch1 the latest change).

UPDATE 2 Based on torek's answer below, I followed the steps and discovered that the merge base is somewhere between ch3 and ch4 on the graph above.

Which to me raises more questions than it answers, because why is it not just applying the ch4 changes, based on what happened after the merge base.

And why does it think that ch1 (which in our case is the initial commit), should be applied over all subsequent changes.

Are we using git wrong? Is there something we're missing? This makes no sense to me from a user's point of view.

How do we make sure that this doesn't happen and that work isn't lost? Thank you for any help.

Upvotes: 0

Views: 542

Answers (1)

torek
torek

Reputation: 488183

I cannot tell you why you got the result that you got.

I can, however, tell you how you can find out why.

Step 1: forget about branches, just look at commits

When you run git merge, Git does not really "merge branches": that's a higher level fiction we tell ourselves to make it easier to think about. Instead, Git's git merge has two major parts. The first is merge as a verb, which consists of finding some sets of changes and combining them. The second is merge as a noun (or adjective), which consists of making a commit that in most ways, is just like any other commit. What's special about this merge commit—there it is: merge as an adjective, in front of the word "commit"—is that it has two parent commits, instead of the usual single parent. This "having two parents" affects the way that future merge-as-a-verb operations will work.

Merge as a verb

In order to perform the merging action, Git needs to run two git diff commands.1 These two git diff commands compare some specific commit, what Git calls the merge base commit, against two other specific commits, which Git usually calls branch tips.

The branch tips are just what you would expect, given a drawing of a graph where later commits are towards the right, and branch names act as pointers to the tip-most commits:

       A1--A2--A3   <-- branch-A
      /
...--*
      \
       B1--B2--B3   <-- branch-B

The merge base, which I marked with * here, is, loosely speaking, the right-most commit at which the two branches rejoin. When the branches forked off very simply like these do, it's obvious which commit is the merge base. Once you've made several earlier merges, it's less obvious.

In any case, Git makes you choose one commit by running git checkout or git commit. Whatever commit you have checked out right now, such as the tip of branch-A, is the current commit.2 There are several more names for the current commit, including HEAD and @, but in general there's always some current commit. Checking out a branch by name makes the tip of that branch be the current commit. Let's call this commit L, for Left or Local or --ours, so that we have a handy one-letter name for it.

Git will make you choose the other commit. You can run git merge <hash-id> or git merge <name>. No matter which you use, Git will find a hash ID for the commit and hence be able to work directly with this other commit. Let's call it R, for Right or Remote or --theirs.

But now, Git will choose the merge base on its own. It will do that by looking at the commit graph: commits L and R, the parent commit(s) of L and R, the parents' parents, and so on, until it finds where the commit chains meet up.

Step 2: finding the merge base

This is where you need command-line Git. GUI tools are useless here. In order to see what Git is doing, you must find the merge base commit yourself. You can do this with the command line interface:

$ git merge-base --all L R    # you can use names or hash IDs here

Git will print out all the merge bases it finds, by their hash IDs. Ideally, there will be exactly one merge base. If there is more than one merge base, you have a complicated case (these result from "criss cross merges") and we have to dive deeper; but almost always, there's just one hash ID.

Save this hash ID (copy with mouse, save it in a shell variable, write it down, whatever). You will need it twice.

Step 3: finding the changes since the base

Now, look at what Git will see as "what we changed":

$ git diff --find-renames <merge-base> L

This compares the merge base to the L commit. Whatever differences turn up here, this is what Git thinks we did. You might want to save this diff output to a file, so that you can peruse it more easily.

Then, do the same thing but for the R commit:

$ git diff --find-renames <merge-base> R

This is what Git thinks they did, in their branch. Again, you may want to save this diff output to a file.

Git will now combine these two sets of changes. Whatever we changed, Git applies those changes to the base. Whatever they changed, Git applies those changes to the base, too.

If those changes are what you expected, and you combine them, you should get the combination you expected too. Probably, those changes aren't what you expected, and you've found the source of your problem.

In any case, after Git has—or thinks it has—combined all the changes successfully, Git will commit this combined result. The new commit will go on your current branch just like any other commit, but will have two parents. The first parent will be commit L and the second parent will be commit R.

Note that the result of combining changes is symmetric: it doesn't matter which commit is L and which is R. Whether you git checkout branch-A and then git merge branch-B, or git checkout branch-B and then git merge branch-A, the source tree you get will be the same. What's different is which branch gets the merge commit, and which of L and R is the first or second parent.

You can also now run git log --all --decorate --oneline --graph and search the output for the merge base commit hash (or the abbreviated part of it) that you found when you ran git merge-base. You can look at the graph that --graph drew and try to figure out why that's the correct merge base of the two commits you chose as L and R. It is the correct merge base, even if you don't think so, and you don't get a choice about it: you only get to choose L and R. These graphs often get very tangled, so that it's hard to figure out why it's the right merge base; but it always is.


1Internally, Git manages to skip a lot of the actual git diff-ing, in a lot of cases, to make it all go (much) faster; but that's all optimization on top of a logical process by which we can reason about what Git will do.

2This is true at all times: Git always has some current commit. Well, almost all times: in a new, empty repository, there are no commits at all, so there is no current commit either. There is also a special case after git checkout --orphan <newbranch>: you are, in this case, on an "unborn branch", and there is as yet no current commit. The way Git handles this varies a bit from one command to another. Let's just pretend it doesn't happen, here. :-)

Upvotes: 5

Related Questions