Why did pulling a child branch overwrite changes on parent branch?

Question

I have a branch A with a lot of changes (including a lot of refactoring), so I decide to create a separate branch just for the refactoring. I branch out of A into a new branch B. I now have the same changes on A and B (compared to master).

I delete all of the new functionality on B since I want to only commit the refactoring. I commit my changes onto B and open a pull request. I then checkout A and pull B. Now all of the commits on B are applied to A, essentially deleting the new feature, only leaving the refactoring.

Why did this happen? I expected to have some merge conflicts and preserve the changes on both branches. Instead, branch B completely overwrote A.

torek · Accepted Answer

Lasse V. Karlsen has already provided an answer in the form of a comment, but here is another way to look at it. You asked:

Why did pulling a child branch overwrite changes on parent branch?

and then provided a description of the setup—this is a good thing: providing this description, that is—using phrases like "branching from a branch". The problem is that Git does not provide parent/child functionality in branches. By starting from a false assumption, you end up fighting with Git.

The key to understanding Git is to realize that branches are unimportant—at least, if by "branch" you mean "branch name". A branch name serves just one purpose: it finds some specific commit. What matters is not the branch name, but rather the commit itself. Unfortunately, the word branch is ambiguous: when we use the word to mean "some collection of commits", branches do matter, because it's the commits that matter. So branches don't matter, except when they do: not a good situation, but that's how things are.

Where we do find parent/child relationships is in the commits themselves. It's only there, though. The names we use to find these commits move about over time. I find this is best understood through drawings. Let's make some drawings:

I have a branch A with a lot of changes (including a lot of refactoring), so I decide to create a separate branch just for the refactoring. I branch out of A into a new branch B. I now have the same changes on A and B (compared to master).

I like to use single uppercase letters to draw the commits, so I'll rename these names, using branch-A and branch-B. First, though, we need to draw some of the original commits on master.

To draw a commit, we write down its big ugly hash ID, and then draw one or more arrows coming out of the commit, pointing to the commit's parents. The hash IDs are very large (40 characters) and unweildy and too difficult to use, so I replace them with symbols or letters; here I'll use the letters.

Most commits have just one parent—the usual exception is a merge commit with two parents—so we'll start with this string of commits:

... <-F <-G <-H

Here, H stands in for the last commit in the chain. It points—via its saved parent hash ID—to earlier commit G, which in turn points to yet-earlier commit F, and so on.

To find commit H by its hash ID, Git stores the hash ID in a branch name, such as master. Getting a little lazy on purpose (because drawing arrows in text characters is hard), we add the name like this:

...--F--G--H   <-- master

Since we're on branch master initially, we add HEAD in parentheses, "attached to" the branch, to indicate that:

...--F--G--H   <-- master (HEAD)

The name master now points to H, which points back to G, and so on.

Now we create a new branch name. This name has to point to some existing commit. The commit we pick will be the last commit in this new branch. We'll pick commit H, because we'd like to start with all the same commits we have so far:

...--F--G--H   <-- branch-A, master (HEAD)

This indicates that commits up through and including H are on both branches, that we're currently working with commit H, and that we're currently using the name master to find that commit. As soon as we run git checkout branch-A or git switch branch-A, we get instead:

...--F--G--H   <-- branch-A (HEAD), master

Nothing else has changed, but HEAD is now attached to branch-A. We're still using commit H and commit H is still the last commit on both branches.

Now we make some new commits. For simplicity I'll just draw two:

...--G--H   <-- master
         \
          I--J   <-- branch-A (HEAD)

When we do make a new commit, Git:

packages up a source snapshot;
adds some metadata, including our name and email address and log message, and including the appropriate parent commit, so that I points back to H, for instance;
saves that as a new commit, which gets a new unique hash ID; and then
writes that commit's hash ID into the current branch name, so that the name points to the new last commit.

So when we made I, the name branch-A automatically advanced to point to I. Then we made J and the name advanced again, giving us this result-so-far. Note that commits up through and including H are on both branches!

We now make another branch name, branch-B, also pointing to the current commit, and switch to that branch name:

...--G--H   <-- master
         \
          I--J   <-- branch-A, branch-B (HEAD)

I delete all of the new functionality on B since I want to only commit the refactoring.

Here, you make a new commit—let's call it K—that deletes new code:

...--G--H   <-- master
         \
          I--J   <-- branch-A
              \
               K   <-- branch-B (HEAD)

Now, as far as Git is concerned, the only point of a branch name is to find some particular commit. (The commit itself then finds all previous commits.) So the name branch-A is pretty much irrelevant. We can redraw this drawing without that name, to get:

...--G--H   <-- master
         \
          I--J--K   <-- branch-B (HEAD)

Commit K takes out the new functionality, leaving only the refactoring. Since commits I-J-K are the ones "on" branch B that aren't on master, the merge procedure to bring those into master gets you to the final state as represented in commit K. This can be a real merge (git merge --no-ff) or a fast-forward, not-actually-a-merge (git merge --ff-only).

If we use the latter and put the name branch-A back into the picture, we get:

...--G--H--I--J   <-- branch-A
               \
                K   <-- branch-B, master (HEAD)

If we use the former—a true merge—and again put the name branch-A back in the picture, we get:

...--G--H----------M   <-- master (HEAD)
         \        /
          \      K   <-- branch-B
           \    /
            I--J   <-- branch-A

(I skipped the letter L just so I could use M for "m"erge-commit here).

Note that in both cases, we end up with branch-A (commits up through J) already included in the merge result (those commits are now "on" master).

What you should have done

Had you created branch-B starting from commit H, you would first have:

...--G--H   <-- master, branch-B (HEAD)
         \
          I--J   <-- branch-A

You can then make commit K to produce:

          K   <-- branch-B (HEAD)
         /
...--G--H   <-- master
         \
          I--J   <-- branch-A

If appropriate, you can create more commits:

          K--L   <-- branch-B (HEAD)
         /
...--G--H   <-- master
         \
          I--J   <-- branch-A

Commits up through H are now on all three branches, but commits I-J are only on branch-A. This situation lasts unless and until we move the branch names around. The names can be adjusted whenever, and however, we want. The commits are frozen for all time: we can redraw them to put them in more convenient places for drawing arrows pointing to them, but the connection from K going backwards to H is fixed forever.

If we now check out master and merge branch-B, using a forced-real-merge so that I don't have to draw the fast-forward case, we get:

          K--L   <-- branch-B
         /    \
...--G--H------M   <-- master (HEAD)
         \
          I--J   <-- branch-A

Since commit L can be found using commit M—it points backwards to both H and L—we can delete the name branch-B safely now:

          K--L
         /    \
...--G--H------M   <-- master (HEAD)
         \
          I--J   <-- branch-A

and we see that commits I-J are still not in the set of commits found by starting at M and working backwards. So they can still be merged. This merge cannot be a fast-forward instead of a real merge, so the result of such a merge requires a new merge commit, which I'll call N:

          K--L
         /    \
...--G--H------M--N   <-- master (HEAD)
         \       /
          I-----J   <-- branch-A

How to fix things relatively easily

Let's assume you did a true merge, but have kept all your names around, and hence now have this:

...--G--H----------M   <-- master (HEAD)
         \        /
          \      K   <-- branch-B
           \    /
            I--J   <-- branch-A

The problem here is that commits I-J are in fact merged, because M reaches back to K which reaches back to J. The code in those commits is gone because the J-to-K difference includes deleting that code. But if we make a new commit, or series of commits, that are copies of I and J as applied to M, we get something we can merge easily.

The command that copies commits one at a time is git cherry-pick. We can do the job this way. We first make a new branch name, e.g., fix, that points to commit M, and switch to it:

...--G--H----------M   <-- fix (HEAD), master
         \        /
          \      K   <-- branch-B
           \    /
            I--J   <-- branch-A

Then we get the hash IDs of I and J, or use the relative syntax trick, to cherry-pick each commit one at a time. Since there are only two commits, we can run:

git cherry-pick branch-A~1
git cherry-pick branch-A

as our two cherry-pick commands. These may have merge conflicts. If so, you just need to fix them as you go. The result will be new commits that refer to each other, and to commit M, as their parents, and have as their snapshots the conflict-fixed snapshots you provide:

                     I'-J'  <-- fix (HEAD)
                    /
...--G--H----------M   <-- master
         \        /
          \      K   <-- branch-B
           \    /
            I--J   <-- branch-A

Here, I' is the copy Git made of commit I, and J' is the copy of J.

If there are many commits to copy, it's handy to be able to cherry-pick all of them in sequence. To do that, we need to give cherry-pick the right list of commits. That's a bit tricky, but the list ends at the commit identified by the name branch-A. We can use Git's two-dot syntax to construct an expression by which Git will find all the commits in this list, with:

git cherry-pick master~1..branch-A

The expression master~ means go back to the first parent of commit M, because master means commit M and the ~ suffix means step back one time, using the first parent for each step. This first parent notion is only meaningful for merge commits like commit M: other commits have only one parent (so that any parent is the first parent). Merge commits always have the commit that was the branch tip before, as their first commit, so that's why master~1 works here.

There's a different problem—and a different graph—if Git did a fast-forward merge. Then, instead of:

...--G--H----------M   <-- master (HEAD)
         \        /
          \      K   <-- branch-B
           \    /
            I--J   <-- branch-A

we have a graph best drawn like this:

...--G--H--I--J   <-- branch-A
               \
                K  <-- branch-B, master (HEAD)

Now there's no easy way to find the hash of commit H, other than to run git log and look for it. So in this situation, the all-in-one cherry-pick command you'd need—whether or not you make a fix branch (I would advise making one)—would be:

git cherry-pick ..branch-A

The end result of this, assuming you create a new name fix and check it out, is:

...--G--H--I--J   <-- branch-A
               \
                K  <-- branch-B, master
                 \
                  I'-J'  <-- fix (HEAD)

which allows the name master to be fast-forwarded to commit J', once you're sure that you have all your fixes in J' as compared to K. If you like fast-forwards, do that; if you prefer true merges, do one of those; either way, you now have the updates you wanted, brought in via new commits that are, in effect, copies of the earlier commits.

Why did pulling a child branch overwrite changes on parent branch?

Answers (1)

What you should have done

How to fix things relatively easily

Related Questions