Git merge process ignores files

Question

There are two branches: branch A with some files and branch B with for example folder named FolderB. When I merge branch B into branch A there is no "FolderB" in staged files in commit. This is an unexpected behaviour for me. I remember that several days ago before this merging I did another merge that finished with conflict that I resolved by selecting option keep ours. Maybe this option saved in somewhere and now it does side effect. If yes where can I delete these settings?

torek · Accepted Answer

I remember that several days ago before this merging I did another merge that finished with conflict that I resolved by selecting option keep ours. Maybe this option saved in somewhere and now it does side effect.

I cannot speak to other interfaces (which could do this sort of thing), but command-line Git does not save merge strategies and strategy-options. However, the fact that you have already merged means that you are not starting from where you think you are starting from.

First, let's make a simple note: "merge" does not mean "make same". If it did, we would not bother with branches at all. Suppose you and I both start with v1.1 of the software, and I make some changes to my copy, and you make some changes to yours. If I then got your commits from you and ran:

$ git merge yours

and this replaced what I did with what you did, I might not be very happy about that!

Hence, what git merge does is find some common starting point—Git calls this a merge base commit—and find out what both you and they, whoever "they" are, have done since then. But this means that you must understand how Git finds this common starting point.

The merge base

Let's look at the normal commit process. We start by cloning some existing repository:

$ git clone

and we begin working and making commits. What exactly is a commit? As any good Git book will tell you, a commit is a snapshot of all of the source at the time you make the snapshot, along with some metadata: who made the snapshot, when, why, and—this is crucial to Git's internal operation—which commit you were working with, just before you made the snapshot.

Each commit has its own unique hash ID, such as b7bd9486b055c3f967a870311e704e3bb0654e4f. These strings of hexadecimal numbers are big, ugly, and impossible for humans to work with. So we have Git remember them for us. Each commit not only has its own unique hash ID, each commit also remembers the hash ID of the commit we had checked out before. We—and Git—use a branch name to remember the final commit.

When we make a new commit, our new commit gets a new big ugly ID, but it also remembers which commit was the last commit on the branch. Then Git writes the new commit's big ugly ID into the branch name, so that the name remembers the new one instead of the old one.

The result is that we start with some chain of commits ending in a tip commit like this:

... <-F <-G <-H   <--master

where the name master remembers the ID of commit H. Commit H remembers the ID of commit G, which remembers the ID of commit F, and so on. These internal linkages, which we can draw as backwards-pointing arrows like this, always go backwards, from child commit to parent commit: commits know their ancestors but do not know their descendants. Once made, no commit can ever be changed, so commits cannot be altered to remember their children, but neither can a child ever forget its parent.

So, suppose you clone the repository, and I clone the repository, and we both have:

...--F--G--H   <-- master

Now you make a new commit or two:

...--F--G--H--I   <-- master

and meanwhile someone else makes a commit or two:

...--F--G--H--I   <-- master
            \
             J--K   <-- other

(Every commit has its own unique ID, and these uppercase letters just stand in for the real IDs. We'll run out of letters after 26 commits. That's why Git uses something bigger and uglier.)

If we now—either of us—go to merge our work, we start by obtaining the other guy's commits. The result is a graph that looks just like the above. Then we git checkout a branch, such as master, and run a merge command such as git merge other.

Git now goes back through the graph—using the backwards-pointing arrows starting from each branch tip—to find the merge base. In this case, that's quite clear: it is commit H. Git then runs, in effect, two git diff commands:

git diff --find-renames     # what we did on master
git diff --find-renames     # what they did on other

Git's job is now to combine these changes, applying them to the snapshot in H, and make a new merge commit from the result. If all goes well, Git does this on its own. If not, you get merge conflicts, and you must resolve them correctly. You tell Git that you have done this, and then Git makes the new merge commit:

...--F--G--H--I---L   <-- master (HEAD)
            \    /
             J--K   <-- other

What's special about this merge commit L is that it has two parents, instead of just one. Commit L, the merge commit, remembers that this is the correct way to combine everything "from H to I" with everything "from H to K".

The merge base, again

Now suppose you and I both continue working. I make new commits that come after K, and you make new commits that come after L:

...--F--G--H--I---L--M--N   <-- master (HEAD)
            \    /
             J--K--O--P--Q   <-- other

If you now run git merge other again, Git has to find the merge base of your latest commit N with my latest commit Q. The merge base this time is not H! We start at both N and Q and work backwards to find the most recent shared commit. Starting at Q, there is only one path backwards, through P, O, K, J, H, and so on.

Starting at N, however, we can go both ways at commit L: N, M, L, then both I and K.¹ So the first shared commit is now K!

What git merge will do this time is diff commit K against the two branch tip commits, N and Q. Meanwhile, commit L, our earlier merge, is what we told Git is the right way to deal with combining K with our work.

Suppose, then, that these new files were added in commits J or K, but are not in L because we selected a "keep ours" option. The diff from K to N will say: delete these here files. The diff from K to Q might or might not modify those new files. We can guess that in this case, it does not modify them, because if it did, we would get a new merge conflict, with Git telling us that we deleted the files, and they changed them. If they did not change them, Git assumes our "delete" is the right answer, and keeps them deleted.

¹Remember, the internal arrows always point backwards—left-ish the way I have drawn them here. We can go from L to K, but not from K to L.

What to do about this

If this is what is going on (and, from my experience, it is), the real problem is that your earlier merge—the commit we drew as L above—is wrong: it deletes the files. But we already noted that you cannot change any earlier commit.

There are two strategies for fixing the problem. The first, and often best, is to acknowledge the error and leave it in place: leave the wrong merge where it is, but add a new commit now that stores the corrected snapshot. We can do this in any number of ways. All of them involve a lot of work. My preferred method for most cases is to re-perform the merge but to do it correctly this time: make a new branch that points to the commit just before the incorrect merge:

                ............<-- rework (HEAD)
               .
...--F--G--H--I---L--M--N   <-- master
            \    /
             J--K--O--P--Q   <-- other

and then run git merge , which in this case would be the hash of commit K. You will get the same merge conflict as last time, and this time you can resolve it correctly.

After resolving it correctly and committing, you can run git diff to see what changes need to be added now to correct the problem. For instance:

$ git diff   > /tmp/patch
$ git checkout master
$ git apply /tmp/patch

Alternatively, if you're sure you want "keep ours" for everything except the new files, you can just copy the extra files in, git add them, and commit.

The second strategy for dealing with the problem is to make it all go away: throw out commit L (the bad merge) and every subsequent commit. This obviously makes you re-do a bunch of work. It has another terrible side effect as well, if you have given these commits (whatever their hash IDs are) to anyone else: those other people still have the commits. You will have to get them to throw out their copies of these commits too. Otherwise they will keep coming back.

To use this second (not as good in most cases) strategy, you can use the git reset command. Since this is usually the wrong way to deal with the problem, I will stop here, but see How to revert a Git repository to a previous commit.

Git merge process ignores files

Answers (2)

The merge base

The merge base, again

What to do about this

Related Questions