Reputation: 4398
I have two branches, master and feature. If I do:
git diff --name-only master..feature
I get a long list of files, some of them source code, so not excluded by .gitignore
But, when I try to merge feature into master:
git checkout master
git merge feature
I get only a single file changed in master during the merge process.
Why does this happen?
Another interesting thing is, if I try the reverse and merge master into feature, files that were created in the feature branch are deleted.
How do I fix this and avoid this issue in the future?
Upvotes: 3
Views: 4003
Reputation: 489223
That's not a bug.
Consider the following simple example. Suppose there is a file named example.txt
. In branch X, it reads:
This is
quite
a file.
In branch Y, it reads:
This is
not
a file.
What should the result of merging branches X and Y be? Specifically, what content do you expect to appear in the file named example.txt
?
What information, if any, have I failed to give you? What else do you need to know before you can even answer this question?
(Try to come up with an answer before you read on.)
Before we go on, let's note that the unit of storage you deal with, in Git, is the commit, not the file. It's true that commits contain files, but the general idea here is that it's a package deal: a commit has a full snapshot of all of the files. If we take some starting commit:
git checkout somebranch
and split a big file, bigfile.py
, into two smaller files, small1.py
and small2.py
and remove bigfile.py
entirely and then commit, the new commit lacks bigfile.py
and adds the two smaller files, as compared with the old commit. When we check out the old commit, we have just one of the three files—the big one—and when we check out the new commit, we have just two of the three files. It's a package deal: you can pick the commit with one file, or the commit with two, but you never get both the big file and one of the small ones, or all three files, or some other combination.
Still, commits contain files, and that will be important later when we get around to merging. But besides containing files—that's their main data: a snapshot of every file (as of the way it appeared when you made that commit)—each commit contains some metadata, or information about the commit. This includes the stuff you see in git log
output: the name and email address of the person who made the commit, and a date-and-time-stamp, for instance.1
In amongst all this metadata, Git stores, in each commit, the raw hash ID of some earlier commit(s). Most commits store exactly one earlier commit hash ID. These hash IDs are the "true names" of the commits, too: they're how Git actually finds each commit. The commits are stored in a big key-value database, with the hash ID of the commit being the key, and the commit's content being the value.
With each commit storing the previous commit's hash ID, we end up with a nice simple linear chain of commits. If we use uppercase letters to stand in for each hash ID, we get drawings that look like this:
... <-F <-G <-H
where H
is the hash ID of the last commit in the chain. Inside commit H
, Git has stored the actual hash ID of earlier commit G
. Inside commit G
, Git stored the hash ID of still-earlier commit F
, and so on.
These chains allow Git to work backwards, from the latest commits back to earlier ones. These are the history in a Git repository, so these chains are crucial to using Git. And, since each commit stores a full snapshot, we have to have Git compare two commits to see what changed. If we have Git compare the snapshot in G
to the snapshot in H
, for instance, that tells us what we changed when we made H
from G
.
So, this is what git log
does: it starts at the latest commit (such as H
), prints out the hash ID and the metadata, and if we used -p
to get patches, extracts both G
and H
(to a temporary memory area) and compares the two commit's snapshots to figure out what changed, and show us that. Then, having shown commit H
, Git moves backwards one step to commit G
: it prints out the hash ID and metadata, and if we used -p
, compares F
-vs-G
. Having printed out G
, git log
moves back one more step to F
, and so on down the line.
(In other words, Git works backwards. I won't emphasize this more here but it explains a lot about Git, once you realize this.)
1If you use git log --pretty=fuller
, you'll see that each commit actually has two of these: an author and a committer. Each one is made up of a triplet: name, email, timestamp. Usually both are the same these days, except for cherry-picked commits, where the author of the original commit is retained, and the committer is the person who did the cherry-pick, with the committer time stamp being the time of the cherry-pick action.
To make the above work, we have to know—somehow—the hash ID of the last commit in the chain. We need to give that hash ID to Git, because Git can only find commits by their hash IDs, in the end. We could write down these hash IDs, jotting them on paper, or on a whiteboard, or something. But they're really big and ugly and hard to type in correctly. Plus, we have a computer. Why not have the computer remember the hash IDs for us? We could add a second database to our Git repository: it would hold names, like master
or develop
or feature
, and with those names, remember the hash ID of the last (most recent, most useful, whatever) commit.
That's just what a branch name is: it's an entry in a names database. The actual name is extended a bit: master
is really refs/heads/master
and feature
is really refs/heads/feature
. This leaves room for other kinds of names, like tag names: v2.1
is really refs/tags/v2.1
. But for branch names in particular, they all hold commit hash IDs—one each—and that hash ID is the ID of the last commit that we're going to consider to be "on the branch".
If we only have one branch, everything is easy:
...--F--G--H <-- master
Here, the branch name master
is the only name, and it holds the hash ID of our most recent commit, commit H
. So the name master
points to the commit at the end of the chain. That lets us (and Git) access commit H
. Commit H
points backwards to commit G
, which lets us (and Git) access it; commit G
points backwards again, and so on.
If we create a new branch name now, such as feature
, we can pick any of the existing commits to have this new name point-to. Most often, though, we'll pick the commit we're using: H
, via master
. So we'll get:
...--F--G--H <-- feature, master
Now we have a problem. Which branch name are we using? To remember, we'll add a special name, HEAD
, and attach it to one of these two branch names. Let's attach HEAD
to feature
—by running git checkout feature
if necessary—and draw that:
...--F--G--H <-- feature (HEAD), master
We're still using commit H
, but now we're using it because of the name feature
.
Now let's create a new commit, in the usual way: modify some files, maybe even create new ones and/or remove existing ones, and use git add
and/or git rm
as needed to get them all updated, and git commit
the result. Without worrying too much about all the details, this has Git save away a new snapshot, add some metadata, and write out the collection as a new commit. The new commit gets a new, unique hash ID—something random-looking, and unpredictable since it depends on the exact time at which we make the commit—but we'll just call it commit I
. The new commit will point backwards to the existing commit H
:
I
/
...--F--G--H
Once the new commit exists, even before we get back to being able to run more commands, Git now does its last special trick: it writes the new commit's hash ID into the current branch name, i.e., the one HEAD
is attached-to. Since that's feature
, we get:
I <-- feature (HEAD)
/
...--F--G--H <-- master
Commit H
comes right before commit I
, but it's still the last commit on the master
branch. Commit I
is the last commit on feature
, but commits up through H
are on feature
too.
Let's go ahead and make one more commit on feature
now:
I--J <-- feature (HEAD)
/
...--F--G--H <-- master
and then run git checkout master
. This will take our HEAD
away from feature
and attach it to master
instead. It will also update our work area so that we are using the contents of commit H
, rather than the contents of commit J
: all our files now match H
, not J
. Any updates we made and snapshotted into I
and J
are safely stored there, in I
and J
, but they're gone from our view now, as we have commit H
out:
I--J <-- feature
/
...--F--G--H <-- master (HEAD)
We could now make another new branch name, say, feature2
, and attach HEAD
to that:
I--J <-- feature
/
...--F--G--H <-- feature2 (HEAD), master
and then make two new commits on feature2
:
I--J <-- feature
/
...--F--G--H <-- master
\
K--L <-- feature2 (HEAD)
Or, we could just go ahead and make these commits directly on master
:
I--J <-- feature
/
...--F--G--H
\
K--L <-- master (HEAD)
As far as the graph itself goes—the set of commits with the backwards-pointing arrows between them (drawn here as lines because the arrow graphics available in text are poor)—it doesn't matter: we can't change any existing commits (ever), but we can always add new commits, and either way, we end up with this set of commits. It's just a question of which names find these commits. But Git allows us to create, destroy, or move branch names any time we like. The commits don't change; it's just that the names we use to find them might be different.
It's time to answer the question above: what's missing?
When we merge some commits in Git, this is all about combining work. The idea is that someone, in some series of commits (I-J
perhaps), did some work, and someone—probably someone else—in some other series of commits (K-L
) did some work. That gives us this:
I--J <-- br1
/
...--G--H
\
K--L <-- br2
Because of the nature of commits—they never change—we can tell, from this graph, that these two lines of work started from a common starting point, namely commit H
. It's really easy to see, visually, that everything in J
is descended from H
, and the same is true for L
. They also descended from G
, but H
is "better" because it's "closer" to the end-point commits.
Now, we already know that Git can compare two snapshots like G
and H
, or I
and J
. What if Git can easily compare H
directly with J
? Well, it can; and if we have Git do that, we'll find out what's different from H
to J
. That's the work someone did on the top line. So those are the changes in br1
.
Similarly, if we have Git compare what's in H
to what's in L
, we will find out what work someone did on the bottom line. Whatever files are different, and whatever rules we use to change the contents of files in H
to those in L
, that's what someone did on br2
.
This also tells us what's missing. In order to merge example.txt
, we need not just the two end-point files—one says quite
on line 2, for instance, and the other says not
on line 2—but also the base copy of the file. The base copy of example.txt
is the copy of the file in commit H
. Commit H
is the merge base of the two tip commits, and its copy of each file is how we figure out what changed.
If the base copy says:
This is
quite
a file.
then we know nothing changed in the one that still says quite
, and one line changed in the one that says not
.
If the base copy says:
This is
not
a file.
then we know nothing changed in the one that still says not
, and one line changed in the one that says quite
.
If the base copy has no line 2—if it reads, in its entirety:
This is
a file.
then we have a merge conflict, because both people made a change: both added a line-2, but they added different line-2-s.
If the two branch tip commits—the one found by the name master
, and the one found by the name feature
—are different, that just tells us that they're different. The recipe that Git comes up with, that will change one commit to make it match another commit, just tells us how to change the one tip commit into the other tip commit.
If the merge base commit between these two branch-tip commits is some third commit,2 we need to know what's in that third commit, because that's how git merge
will figure out what changed in master
and what changed in feature
. The merge command will then attempt to combine those two sets of changes, applying the combined changes to whatever is in the merge base.
As phd commented, you can use the triple-dot notation with the git diff
command:
git diff master...feature
for instance. This has Git:
$B
); thengit diff $B feature
which tells you what changed on feature
, with respect to this merge base. If you then run the same command with the two names swapped around:
git diff feature...master
Git will find the merge base of the same two tip commits,3 and then diff $B
vs master
: this shows you what changed on master
.
Again, what git merge
does for these cases is:4
If this all goes well, git merge
will make a merge commit from the result. A merge commit isn't very different from a regular non-merge commit: it still has a snapshot of all files—as built by the combining process above—and some metadata. The special thing about a merge commit is that it lists both branch-tip commits as its parents, so that Git can go back along both branches (which are now combined into one "branch" via the merge commit: this exposes a flaw in the word "branch"; see What exactly do we mean by "branch"?).
2There are some degenerate cases here. In particular, if the merge base is one of the two branch tip commits, we either have a simple "fast-forward-able" case, or else there's nothing to merge. Given what you've posted, you must not have one of these cases, though.
3If there's only one merge base commit—and this is normally the case—it doesn't matter what order the two branch tip commits are listed in. For some complex commit graphs, however, there may be two or more merge base commits. Here, the picture gets rather murky. The git diff
command didn't handle this very well, until quite recently; git merge
handles it better, but it's still tricky.
4This description makes a lot of assumptions about how you're doing the merge, the shape of the graph, and so on, and is otherwise greatly simplified vs what git merge
really does internally. The idea is to capture the overall goal, without getting into some of the stickier mechanics. For instance, this disregards how merge handles the case of a renamed file.
Upvotes: 12