New files while resolving the merge conflict in Git

Question

I am fairly new to Git so apologies if this sounds like a trivial issue. I committed some code yesterday in Git in my featured branch. Today when it was reviewed and approved, I am trying to merge into the develop and getting merge conflicts.

Today I was at develop branch which is up to date and have some new new files committed by another developer. When I switched to my feature brach from develop, I am seeing all the new files from develop branch as new files in my IDE. So should I commit only the files which had conflicts or I will have to commit all the new files as well after resolving the conflict.

torek · Accepted Answer

This question seems to be about half about how to use your IDE, but note that Git itself stores all files in every snapshot. This includes merge snapshots.

Hence, if you literally remove the files from the merge before committing, the merge result will not have the new files. In effect, your merge will claim that the correct way to incorporate those files is to remove them. They should therefore stay as "new files", unless the correct way to incorporate them is to remove them.

More details

It's important, when using Git, to keep a lot of things in mind all at the same time. (This is one of the reasons Git is hard to get started with.) Here's a list of some important items to know:

Git is not really about files, and not even really about branches. Git is all about commits. This means you need to know exactly what a commit is and does.
Each commit holds two things: its main data, in the form of a snapshot of all your files; and some metadata, or data-about-the-data, that describe this commit. The metadata include who made it, when, and—important for humans, though irrelevant to Git itself—why you (or whoever) made the commit. They also include some commit hash IDs. These hash IDs are super-important to Git, though you may not care all that much about them yourself.
Each commit gets a unique hash ID. The "true name", as it were, of each commit is its hash ID. These hash IDs are how Git finds the commits. If you want to get files out of Git, you will use a hash ID, even if it's obscured by using a name: a branch name like master or develop, or a tag name like v2.1, or whatever.

These hash IDs are big and ugly and impossible for humans to deal with. They need to be big and ugly because of this constraint that every commit have a unique hash ID. The obvious approach, of just numbering them sequentially (commit #47 would come after commit #46, etc) might work, except for the fact that Git is distributed. There's no central Git Commit Number Assigner that everyone can go to, to get the next number.

Since they are big and ugly, we generally don't actually look at them. We use names, which we'll say more about in just a moment.
Every commit—well, almost every commit—has a parent commit. The parent is the commit that came just before this one. That's what this extra metadata is about: commits store the hash ID of their parents. Merge commits are special in exactly one way: they store more than one parent, i.e., there's more than one commit that comes right before this one.

(The very first commit someone makes has no parent, because there is no earlier commit. This commit is called the root commit. Every non-empty repository has at least one, and it's unusual to have more than one, though it is possible to make new root commits, or acquire them in other ways.)
A branch name, like master, just remembers the raw hash ID of the last commit in the branch.
Hence, branches grow, by adding new commits. Adding one new commit consists of making a new snapshot—a new copy of every file—with its parent set to the current commit. Git then makes the branch name remember the new commit's hash ID.

We can therefore draw commits as a chain, with the newest commit at the right (or at the top for git log --graph), like this:

... <-F <-G <-H   <--master

Here, each of the uppercase letters stands in for some big ugly commit hash. The latest commit's hash is H, and H contains the hash ID of its parent G. Git can use the hash ID in H to find G. Commit G contains the hash ID of its parent F, so Git can find F. F contains the hash ID of its parent, and so on.

But how do we find H? That's where a branch name comes in: the branch name just holds the hash ID of the last commit.

Hence, in order to add a commit to master, Git:

writes out the commit, including the parent hash ID H
which computes a new unique hash ID which we'll call I
which Git then stuffs into the branch name master, giving
```
... <-F <-G <-H <-I   <--master
```

Commits are snapshots, but you view changes

When we have a linear chain of commits like this:

...--F--G--H   <-- master

and we ask Git to show us commit H, we see changes, rather than a snapshot. But that's because that's useful, so Git actually extracts both commits G and H to temporary areas (in memory really), and then compares them.

The two commits, G and H, are two snapshots. Both hold all your files. The copy of README.md in G and the copy in H might be different, though, and in that case, when showing you H, Git shows you the difference between the copy in G and the copy in H.

You can, of course, have different files in the two commits. Perhaps both G and H have README.md—and maybe they're even the same in both commits—but maybe H has file.py that isn't in G at all. In that case, G-vs-H shows a new file.

Note that you can also have files in G that are not in H; in this case, comparing, as Git does, tells you that the file is deleted. It's still there in commit G, as a full snapshot. It's just not there in H.

Multiple branches

When you have more than one branch name, what you have is something that you might be able to draw like this:

          I--J   <-- master
         /
...--G--H
         \
          K--L   <-- other

The two names, master and other, select two commits by hash ID. Commit J is the tip of master—the last one there—and commit L is the tip of other.

Now that we have two branch names, we need a way to remember which one we're actually using. Git uses the special name HEAD for this, attaching it to one of the branch names you have in your repository:

          I--J   <-- master (HEAD)
         /
...--G--H
         \
          K--L   <-- other

Note that the commits up through and including H are on both branches. (Git is unusual here; most version control systems don't work like this.) Commits I and J are only on master, and—at least right now—commits K-L are only on other.

Merging is about combining work

When you are on your branch, and you are merging work that someone else did on some other branch, you don't want to just take their files as-is in their latest commit, nor just take your files as-is in your latest commit. You want to combine the changes you made with the changes they made.

Since Git only stores snapshots, though, how will Git find changes? We already saw how Git can compare a commit with its parent. But suppose you have:

          I--J   <-- master (HEAD)
         /
...--G--H
         \
          K--L   <-- other

How do we compare the changes you made on master with the changes they made on other? Git's answer to this is: find the best shared commit, that's on both branches. Here, that's obviously commit H. So Git now compares all the files in H with all the files in J:

git diff --find-renames     # what we changed

Then, Git compares all the files in H with all the files in L:

git diff --find-renames     # what they changed

Git can then combine the changes. Whatever we did, Git can do the same thing to the files in H, but also do whatever they did to the files in H too.

Merge conflicts

Sometimes, though, in an attempt to combine these changes, Git runs into a problem. For instance, what if we changed line 42 of README.md, and they also changed line 42 of README.md, but we made different changes? In this case, Git does all the combining it can, and then stops with a merge conflict.

Your job is now to resolve these conflicts. Git's conflict-resolving powers are limited, but it provides a bunch of tools—of varying quality—to help, and lets you add your own tools on top. A lot of IDEs add a lot of tools, of varying quality, and I can't say anything about most of them as I don't use them.

Chances are good, though, that you will run git status during and/or after your resolving process. This git status says different things depending on where you are in the resolving process. I'm going to assume here that you're done with the resolving—that git status says all conflicts resolved or does not say anything about unmerged files. (The precise output depends on your Git vintage; old Gits, from before the 2.x series, are not nearly as good here, and anything older than about 1.8.4 is really not good.)

When you use git status at this point, Git is comparing your proposed next commit, which will be a merge commit—which we haven't drawn yet or described—to the current commit. That is, you're still in this situation:

          I--J   <-- master (HEAD)
         /
...--G--H
         \
          K--L   <-- other

but there is, on the table, a proposal to make a new commit M. The snapshot in M will be different from the snapshot that exists now in J, and git status will tell you about it, in much the same way that Git might show you the difference between J and this proposed M.

Now, suppose that in the difference from H to L, Git found that they added some new files. Those files are not in H and are in L. Those files are probably not in I and J.

So, Git took those new files from L and they are now in your proposed next commit. If you make merge commit M now, those files will exist. Comparing J vs this proposal, these files are new files.

So git status will tell you that the files that were added by them are new! You probably want to keep them. If you remove them now, they will be gone from your proposed new commit.

You're still in the "resolving" step, whether or not you think you're done. You've told Git that each of the conflicted files that Git was complaining about, are done. Their resolved versions are ready to go into the new commit, and git status will compare the files in J to the files in the proposed new commit and say they're different (if they are). But you can, while in this state, keep making more changes—ones that didn't come from commits J or L.

It's rarely a good idea to make more changes. People call changes made at this point an evil merge. See Evil merges in git? for more about this. You can do it, and if you feel that there is a good reason to do it, perhaps you should do it after all. Remember that you get a chance to explain that you did this, and why you did this, when you make the new merge commit. But you probably want to keep the new files here, as new files.

In any case, you now finish the merge, using command-line Git, with:

git merge --continue    # or git commit, if your Git does not have --continue

This makes the final merge commit. As we noted earlier, a merge commit has two parents (technically, two or more, but you probably won't encounter these so-called octopus merges yourself). The first parent of the merge is the usual parent. The second is the commit you told Git to merge:

          I--J
         /    \
...--G--H      M   <-- master (HEAD)
         \    /
          K--L   <-- other

New commit M now has two parents, along with a snapshot like any commit, and an author and date-and-time-stamp and log message and so on. The first parent is J, and the second one is L because you said git merge other and other names commit L.

Viewing merge commits is different

When you go to view this commit later, Git won't show you what changed, by default. That's because Git doesn't know which parent to compare against. Should Git extract J and M, and compare those two? Or should it extract L and M, and compare those? The git log -p command is lazy, and just does not do either one.

There are other Git commands, and other ways to view changes, that let you pick out which parent to use. The simplest is to add -m to git log -p. That says: When you hit a merge commit, run one diff for each parent. That is, git log will now compare, first, J-vs-M, and show that; then compare L-vs-M, and show that. But you do have to ask for this.

You should know that git show will show a merge commit using what Git calls a combined diff. But combined diffs deliberately omit a lot of details. Mostly, they try to show the areas where merge conflicts did or might have happened.