Reputation: 1717

When does git lose changes during a merge?

Let's say that:

We have a master branch where a coworker accidentally added a series of commits (let's call them A B C) that should have belonged to a new feature.
I discover that, and I tell him to move those commits to a new branch, but keeping other unrelated commits that were done later in master. I send him this question I asked and tell him to follow the response: git: how to move a branch's root two commits back
Days later, when the new feature branch is ready, I merge it into master.
After solving all the conflicts in the merge, I commit the changes...
...and I discover that those first commits (the A B C ones) have disappeared.
I ask my coworker, and he says that "he thinks" that he moved those changes using the method mentioned in the link (basically: checking out the last common commit and then using git cherry-pick to pick only the commits that we wanted later), but he can't remember exactly.
I check the repo's history, and A B C are in the feature branch, at the beginning. They look like they were successfully migrated from master.

Given the above, can anyone explain why git lost those changes? (My personal theory is that git somehow "remembered" that we had undone commits A B C, so when they came from the new feature branch, git decided not to merge them. EDIT: sorry if this explanation sounds too much like "magical thinking", but I'm at a loss. I welcome any attempt to put this explanation in more technical terms, if it's right).

Sorry for not being able to give more details, but I didn't make those changes in the repo personally, so can't give exact details of what was done.

EDIT: okay, as suggested here, I got my coworker to execute git reflog in his machine, so I am pasting here the results. To get back to my previous (linked) question, we had a tree like this:

A - B - C - D - E - F  master
            \ 
             \- G - H  new feature branch

And we wanted to move B and C to the new feature branch.

So, the git reflog he sent me is here. Commit 5acb457 would correspond to "commit A" in the graph above:

4629c88 HEAD@{59}: commit: blah
f93f3d3 HEAD@{60}: commit: blah
57b0ea7 HEAD@{61}: checkout: moving from master to feature_branch
4b39fbf HEAD@{62}: commit: Added bugfix F again
4fa21f2 HEAD@{63}: commit: undid checkouts that were in the wrong branch
1c8b2f9 HEAD@{64}: reset: moving to origin/master
5acb457 HEAD@{65}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{66}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{67}: checkout: moving from 1c8b2f9bf54ca1d80472c08f3ce7d9028a757985 to master
1c8b2f9 HEAD@{68}: rebase: checkout master
5acb457 HEAD@{69}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{70}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{71}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{72}: merge origin/master: Fast-forward
5acb457 HEAD@{73}: checkout: moving from master to master
5acb457 HEAD@{74}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{75}: checkout: moving from undo_branch to 5acb4576eca4b44e0a7574eea19cca067c039dc5
5acb457 HEAD@{76}: checkout: moving from master to undo_branch
1c8b2f9 HEAD@{77}: checkout: moving from undo_branch to master
525dbce HEAD@{78}: cherry-pick: Bugfix F
a1a5028 HEAD@{79}: cherry-pick: Bugfix E
32f8968 HEAD@{80}: cherry-pick: Feature C
8b003cb HEAD@{81}: cherry-pick: Feature B
5acb457 HEAD@{82}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to undo_branch
5acb457 HEAD@{83}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{84}: checkout: moving from 1c8b2f9bf54ca1d80472c08f3ce7d9028a757985 to master
1c8b2f9 HEAD@{85}: pull origin HEAD:master: Fast-forward
5acb457 HEAD@{86}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
5acb457 HEAD@{87}: reset: moving to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{88}: merge origin/master: Fast-forward
5acb457 HEAD@{89}: reset: moving to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{90}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{91}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{92}: merge origin/master: Merge made by the 'recursive' strategy.
7b912cd HEAD@{93}: checkout: moving from 7b912cdf33843d28dd4a7b28b37b5edbe11cf3b9 to master
7b912cd HEAD@{94}: cherry-pick: Bugfix F
df7a9cd HEAD@{95}: cherry-pick: Bugfix E
d4d0e41 HEAD@{96}: cherry-pick: Feature C
701c8cc HEAD@{97}: cherry-pick: Feature B
5acb457 HEAD@{98}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
22ecc3a HEAD@{99}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{100}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
22ecc3a HEAD@{101}: commit: bugfix E
3b568bc HEAD@{102}: checkout: moving from feature_branch to master
57b0ea7 HEAD@{103}: commit: blah
152c5b9 HEAD@{104}: checkout: moving from master to feature_branch
3b568bc HEAD@{105}: commit: bugfix D
fe3bbce HEAD@{106}: checkout: moving from feature_branch to master
152c5b9 HEAD@{107}: commit: blah
2318ebc HEAD@{108}: commit: blah
cc5ea32 HEAD@{109}: commit: blah
a5c2303 HEAD@{110}: commit: blah
544a99a HEAD@{111}: commit: blah
299f86a HEAD@{112}: commit: Feature G
fe3bbce HEAD@{113}: checkout: moving from master to feature_branch
fe3bbce HEAD@{114}: commit: Feature C
3852e71 HEAD@{115}: commit: Feature B
5acb457 HEAD@{116}: merge origin/master: Fast-forward

Can anyone make any sense of those 4 cherry-picks in a row? I suspect that he didn't really do the git cherry-pick master~3 thing, specially not the ~3 part (which admittedly threw me off when I first saw it too).

Upvotes: 6

Answers (3)

Marina Liu

Reputation: 38136

The reason why commits A, B and C are lost, that is because this is what the link you shared to your coworker did. Let illustrate by below graphs:

1. Assume the original commit history your coworker did as,

...X---A---B---C---D---E  master

2. Move A, B and C to feature branch. So your coworker created a new feature branch from master (commit E) or any of a commit. And rebase with below steps:

git checkout -b feature
git cherry-pick master~5 master~2

...X---A---B---C---D---E  master
                        \
                         A'---B'---C' feature

3. Modify master branch by,

git checkout X
git cherry-pick master~2..master
git branch -f master
git checkout master

the commit structure will look like:

...X---D---E  master
     \
       A'---B'---C' feature

So the direct reason is the command git cherry-pick master~2..master. It will rebase commit D and E directly on commit X, so you can’t find A, B and C on master branch.

Update:

Based on the git flog, it seems these HEAD information is not enough to show what your coworker did. And the feature branch seems to checkout from commit C not D by

3b568bc HEAD@{105}: commit: bugfix D
fe3bbce HEAD@{106}: checkout: moving from feature_branch to master
152c5b9 HEAD@{107}: commit: blah
2318ebc HEAD@{108}: commit: blah
cc5ea32 HEAD@{109}: commit: blah
a5c2303 HEAD@{110}: commit: blah
544a99a HEAD@{111}: commit: blah
299f86a HEAD@{112}: commit: Feature G
fe3bbce HEAD@{113}: checkout: moving from master to feature_branch
fe3bbce HEAD@{114}: commit: Feature C

So the structure should be:

A---B---C---D---E  master
         \
          G---H feature

If you only want to change the commit structure like:

A ---D---E  master
 \
  B---C---G---H feature

You can reset your master branch and feature branch as original, and then cherry-pick commits on master branch, details as:

git checkout master
git reset --hard <original commit id for E>
git checkout feature 
git reset --hard  <original commit id for H>
git checkout master
git checkout <commit id for A>
git cherry-pick master~4..master~2 #To make the commits as A---D---E (drop B and C)
git branch -f master
git checkout master

Upvotes: 5

AnoE

Reputation: 8355

You got very long and good answers already. Let me add this:

My personal theory is that git somehow "remembered" that we had undone commits A B C, so when they came from the new feature branch, git decided not to merge them.

Git never "somehow" "remembers" anything about the contents of your repository. Nor does it ever decide to do or not to do anything based on what you did before. It is very clean in that regard. All its commands are just tools to work on the directed acyclic graph that its commits (and on a lower level, all other objects it stores) are building. And to make it even easier, it only ever adds stuff, never changes or deletes anything.

Except for the commits (i.e., author, timestamp, parent commits etc.), trees (i.e. directories), blobs (i.e., binary data) and a few less important things, there are literally no data structures or further management information about your files and such in the repository. A merge commit does not leave any information that is specific to the "merge"; it is simply a commit with multiple parents.

There is certainly no magic, undocumented stuff going on. The repository is very open, you can literally look at everything using git commands, and everything is fully documented (google "git data structures" or "git internals" if you are interested). Even modifying the internal objects is quite easy if you so wish.

There is one little bit where bit keeps historic information around, this is the so-called "rerere cache" which stores previous conflict resolutions and thus can indeed change the behaviour of future merges. Very handy indeed, but not enabled by default, and certainly not related to the topic at hand.

EDIT: sorry if this explanation sounds too much like "magical thinking", but I'm at a loss. I welcome any attempt to put this explanation in more technical terms, if it's right

Trust the source, luke. It is great that you are trying to get your head around git, and a strong belief that everything is plain and non-magical should help, hopefully.

Upvotes: 0

torek

Reputation: 490108

Let's concentrate on the merge result, but start with a quick skim over this part (I've redrawn the graph a bit):

To get back to my previous (linked) question, we had a tree like this:
A--B--C--D--E--F   <-- master
          \ 
           G--H   <-- feature
And we wanted to move B and C to the new feature branch.

The result should have looked like this (with the tick-marks indicating that the commits you have now are copies, not the originals, so their hash IDs have changed, so everyone who got the originals has to scramble to make sure they use the new copies too). But I'll just assume that it did in fact look like this:

A--D'-E'-F'   <-- master
    \
     B'-C'-G'-H'   <-- feature

(note that the only commit not copied-and-switched-to is A!).

When you now run:

git checkout master
git merge feature

Git will do these things in this order:

Get the hash ID of the current commit (git rev-parse HEAD).
Get the hash ID of the tip of feature (git rev-parse feature).
Locate the (single, in this case) merge base of those two commits. The technical definition of the merge base is the Lowest Common Ancestor in the DAG, but loosely speaking, it's just before the two branches diverge, which is simply "commit D'".
Run what amounts to git diff D' F': diff the merge base with the tip of master. This is "what we changed on master since the merge base": a big list of files (and their hash ID versions), along with any computed rename information and the like.
Run what amounts to git diff D' H': diff the merge base with the tip of feature. This is "what they changed on feature", in the same way as in step 4. I use the word "we" for step 4, and "they" here in step 5, because we can use git checkout --ours and git checkout --theirs to extract particular files during a merge conflict: --ours refers to files in commit F', i.e., what "we" changed, and --theirs refers to files in commit H'.
Attempt to combine the differences to get a single changeset.

If Git is able to do all this combining on its own, it declares victory, applies this single changeset to the base commit D', and makes a new commit—let's call this M for merge—in the usual way (so that master moves to point to M), except that M has two parents:
```
A--D'-E'-F'-----M   <-- master
    \          /
     B'-C'-G'-H'   <-- feature
```
If the automatic merge fails, however, Git throws up its metaphorical hands and leaves you a mess that you must clean up yourself. We'll go into this in a moment.

Three inputs, one output

Note that there are three inputs to this three-way merge:

the tree for the merge base
the tree for the current (--ours, HEAD) tip commit
the tree for the other (--theirs) tip commit

The merge base works here because it is a—in fact, the best—common starting point from which the two commits have diverged. Git is able to go straight for the two branch tips because each commit is a complete snapshot:¹ it never has to look at all the intermediate commits, except in terms of the graph so as to find the merge base.

We're also deliberately glossing over a bunch of subtle technical issues, such as pair-breaking and rename-finding (see footnote 1), and things like merge strategies (-s ours means we don't even look at theirs) and strategy options (-X ours or -X theirs). But as long as you are just running git merge feature and there are few or no renames to worry about, that's not a problem.

But—this is one of the key items—in order to figure out what Git is going to do, you must draw the graph, or otherwise identify the merge base. Once you have the hash ID for the merge base commit, you can (if you want to) git diff the merge base against the two tip commits and see what Git will do. But if the merge base is not the commit you are expecting it to be, the merge will not do what you expect it to do.

¹Compare with Mercurial, where each commit is stored, more or less, as a delta or changeset from its parent commit. You might think, then, that Mercurial must start at the merge base and march forward through each commit along each branch chain. But there are two things to note here: first, Mercurial may well have to start before the merge base, because that too could be a changeset from an earlier commit. Second, suppose that along the chain to either tip, some change is made, then backed out. When Mercurial goes to combine the final changesets to implement the same merge as Git, the commit and its backing-out reversion have no effect on the final result. So in that sense, none of the intermediate commits matter after all! We need them only to reconstruct the two final changesets that are to be combined.

In fact, though, Mercurial doesn't do any of this, because each file in Mercurial is occasionally stored anew, fully intact, so that Mercurial won't have to follow extremely long changeset chains to reconstruct a file. Hence what Mercurial does is effectively the same as what Git does: it just extracts the base commit, and then extracts the two tip commits, and does the two diffs.

There's one big technical difference here, which is that Mercurial does not have to guess about renames: the intermediate commits, which—again just like Git—it must traverse to find the merge base, each record any renames with respect to their parent commit, so Mercurial can be certain what the original name of each file was, and what its new name in either tip may be. Git does not record renames: it simply guesses that if path dir/file.txt appears in the merge base, but not in one or both tip commits, perhaps dir/file.txt was renamed in one or both tip commits. If tip commit #1 has other/new.txt that is not in the merge base, that's a candidate file for a rename.

In some cases, Git can't find renames this way. There are additional control knobs. There is one to break pairings if files have changed "too much", i.e., to have Git say that just because dir/file.txt is in both base and tip, that it may not actually be the same file. There is another to set the threshold at which Git declares a file to match, for rename-detection purposes. Last, there is a maximum pairing queue size, configurable as diff.renameLimit and merge.renameLimit. The default merge pairing queue size is larger than the default diff pairing queue size (currently 400 vs 1000, ever since Git version 1.7.5).

The mess you get if there are conflicts

When Git declares a "merge conflict" it stops in the middle of step 6. It does not make new merge commit M. Instead, it leaves you a mess, stored in two places:

The work-tree has its best guess at what it could do as an automated merge, plus all the conflicting merges written out with conflict markers. If file.txt has a conflict—a place where Git was unable to merge "what we did" with "what they did"—it might have a few lines that look like this:
```
<<<<<<< HEAD
stuff from the HEAD commit
=======
stuff from the other commit (H' in our case)
>>>>>>> feature
```
If you set merge.conflictStyle to diff3 (I recommend this setting; see also Should diff3 be default conflictstyle on git?), the above is modified to include what's in the merge base (commit D' in our case), i.e., what text was there before both "we" and "they" changed it:
```
<<<<<<< HEAD
stuff from the HEAD commit
||||||| merged common ancestors
this is what was there before the two
changes in our HEAD commit and our other commit
=======
stuff from the other commit (H' in our case)
>>>>>>> feature
```
Meanwhile, the index—the place where you build the next commit to make—has up to three entries per "slot" for each conflicted file. In this case, for file.txt, there are three versions of file.txt, which are numbered:
- :1:file.txt: this is a copy of file.txt as it appears in the merge base.
- :2:file.txt: this is a copy of file.txt as it appears in our (HEAD) commit.
- :3:file.txt: this is a copy of file.txt as it appears in their (tip of feature) commit.

Now, just because there is a conflict in file.txt does not mean there were not some other changes that Git was able to resolve on its own. Suppose, for instance, that the merge base version reads:

this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
la la la, banana fana fo fana
here is something else
to change with conflict:
this is what was there before the two
changes in our HEAD commit and our other commit
and finally,
here is something to change without conflict:
one potato two potato

In HEAD, let's make the file read this way, using however many commits we like to get to this point:

this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
a bit from the Name Game
here is something else
to change with conflict:
stuff from our HEAD commit
and finally,
here is something to change without conflict:
one potato two potato

(Note that we made two distinct regions of change. By default git diff will combine them into a single diff hunk as there's only one context line between them, but git merge will treat them as separate changes.)

In the other (feature) branch let's make a different set of changes, so that file.txt reads:

this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
la la la, banana fana fo fana
here is something else
to change with conflict:
stuff from the other commit (H' in our case)
and finally,
here is something to change without conflict:
cut potato and deep fry to make delicious chips

Again, we have made two changes, but only one conflicts.

The work-tree version of the merged file will take each change that does not conflict, so that the file will read, in full:

this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
a bit from the Name Game
here is something else
to change with conflict:
<<<<<<< HEAD
stuff from the HEAD commit
=======
stuff from the other commit (H' in our case)
>>>>>>> feature
and finally,
here is something to change without conflict:
cut potato and deep fry to make delicious chips

It's your job, as the one doing the merge, to resolve the conflict.

You may choose to do this:

git checkout --ours file.txt

or:

git checkout --theirs file.txt

but either of these simply copies the "ours" or "theirs" index version (from slot 2 or 3) to the work-tree. Whichever one you choose, you will lose the changes from the other branch.

You may hand-edit the file, removing the conflict markers and keeping or modifying some or all of the remaining lines to resolve the conflict.

Or, of course, you can use any of your favorite merge tools to handle the conflict.

In all cases, though, whatever is in your work-tree will be your final product. You should then run:

git add file.txt

to wipe out the stage 1, 2, and 3 entries and copy the work-tree version of the file to the normal stage-zero file.txt. This tells Git that the merge is now resolved for file.txt.

You must repeat this for all the remaining unmerged files. In some cases (rename/rename conflicts, rename/delete, delete/modify, and so on) there is a bit more work to do, but it all boils down to making sure that the index has only the final stage-zero entries that you want, and no higher-stage entries. (You can use git ls-files --stage to see all the entries in all their stages, although git status does a pretty good job of summarizing the interesting ones. In particular, all files that have stage-zero entries that exactly match the HEAD commit are extremely boring, and git status skips right over them. If there are hundreds or thousands of such files, that's very helpful.)

Once you have resolved all the files in the index, you run git commit. This makes merge commit M. What's in the commit is whatever is in your index, i.e., whatever you git add-ed to remove higher stage index entries and insert stage-zero entries.

Using `git checkout` to check out and resolve at the same time

As noted above, git checkout --ours or git checkout --theirs just gets the copy from index slot 2 or 3 and writes it to the work-tree. This does not resolve the index entries: all the slot 1, 2, and 3 unmerged entries are still there. You must git add the work-tree file back to mark it resolved. As we also noted, this loses any changes from the other tip commit.

If that's what you want, though, there is a short-cut. You can:

git checkout HEAD file.txt

or:

git checkout MERGE_HEAD file.txt

This extracts the version of file.txt from the HEAD (F') or MERGE_HEAD (H') commit. In so doing, it writes the contents to stage zero for file.txt, which wipes out stages 1, 2, and 3. In effect, it gets the --ours or --theirs version and git adds the result, all at once.

Again, this loses any changes from the tip commit.

It's easy to get this wrong

It's very easy to get these resolving steps wrong. In particular, git checkout --ours and git checkout --theirs, and their short-cut versions using HEAD and MERGE_HEAD, drop the other side's changes to a file. The only indication that you will have of this is that the merge result is missing some changes. As far as Git is concerned, that's the correct result: you wanted those changes dropped; that's why you set the stage-zero index entry that way before you made the merge commit.

It's also easy to get a surprise merge base, particularly if you try to do a lot of git rebase or git cherry-pick work to copy commits around and move branch names to point to the new copies. It's always worth carefully studying the commit DAG. Get help from "A DOG": git log --all --decorate --oneline --graph, all decorate oneline graph; or use gitk or some other graphical viewer, to visualize the commit graph. (Instead of --all you might also consider using the two branch names in question, i.e., DOG rather than just any old A DOG: git log --decorate --oneline --graph master feature. The resulting graph is likely to be simpler and easier to read. However, if you did a lot of rebasing and cherry-picking, --all may reveal more. You can even combine this with specific reflog names such as feature@5, though this gets a bit long-winded and makes for quite messy graphs.)

Upvotes: 4