Reputation: 1717
Let's say that:
A B C
) that should have belonged to a new feature.A B C
ones) have disappeared.git cherry-pick
to pick only the commits that we wanted later), but he can't remember exactly.A B C
are in the feature branch, at the beginning. They look like they were successfully migrated from master.Given the above, can anyone explain why git lost those changes? (My personal theory is that git somehow "remembered" that we had undone commits A B C
, so when they came from the new feature branch, git decided not to merge them. EDIT: sorry if this explanation sounds too much like "magical thinking", but I'm at a loss. I welcome any attempt to put this explanation in more technical terms, if it's right).
Sorry for not being able to give more details, but I didn't make those changes in the repo personally, so can't give exact details of what was done.
EDIT: okay, as suggested here, I got my coworker to execute git reflog
in his machine, so I am pasting here the results. To get back to my previous (linked) question, we had a tree like this:
A - B - C - D - E - F master
\
\- G - H new feature branch
And we wanted to move B and C to the new feature branch.
So, the git reflog
he sent me is here. Commit 5acb457
would correspond to "commit A" in the graph above:
4629c88 HEAD@{59}: commit: blah
f93f3d3 HEAD@{60}: commit: blah
57b0ea7 HEAD@{61}: checkout: moving from master to feature_branch
4b39fbf HEAD@{62}: commit: Added bugfix F again
4fa21f2 HEAD@{63}: commit: undid checkouts that were in the wrong branch
1c8b2f9 HEAD@{64}: reset: moving to origin/master
5acb457 HEAD@{65}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{66}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{67}: checkout: moving from 1c8b2f9bf54ca1d80472c08f3ce7d9028a757985 to master
1c8b2f9 HEAD@{68}: rebase: checkout master
5acb457 HEAD@{69}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{70}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{71}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{72}: merge origin/master: Fast-forward
5acb457 HEAD@{73}: checkout: moving from master to master
5acb457 HEAD@{74}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{75}: checkout: moving from undo_branch to 5acb4576eca4b44e0a7574eea19cca067c039dc5
5acb457 HEAD@{76}: checkout: moving from master to undo_branch
1c8b2f9 HEAD@{77}: checkout: moving from undo_branch to master
525dbce HEAD@{78}: cherry-pick: Bugfix F
a1a5028 HEAD@{79}: cherry-pick: Bugfix E
32f8968 HEAD@{80}: cherry-pick: Feature C
8b003cb HEAD@{81}: cherry-pick: Feature B
5acb457 HEAD@{82}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to undo_branch
5acb457 HEAD@{83}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{84}: checkout: moving from 1c8b2f9bf54ca1d80472c08f3ce7d9028a757985 to master
1c8b2f9 HEAD@{85}: pull origin HEAD:master: Fast-forward
5acb457 HEAD@{86}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
5acb457 HEAD@{87}: reset: moving to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{88}: merge origin/master: Fast-forward
5acb457 HEAD@{89}: reset: moving to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{90}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{91}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
1c8b2f9 HEAD@{92}: merge origin/master: Merge made by the 'recursive' strategy.
7b912cd HEAD@{93}: checkout: moving from 7b912cdf33843d28dd4a7b28b37b5edbe11cf3b9 to master
7b912cd HEAD@{94}: cherry-pick: Bugfix F
df7a9cd HEAD@{95}: cherry-pick: Bugfix E
d4d0e41 HEAD@{96}: cherry-pick: Feature C
701c8cc HEAD@{97}: cherry-pick: Feature B
5acb457 HEAD@{98}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
22ecc3a HEAD@{99}: checkout: moving from 5acb4576eca4b44e0a7574eea19cca067c039dc5 to master
5acb457 HEAD@{100}: checkout: moving from master to 5acb4576eca4b44e0a7574eea19cca067c039dc5
22ecc3a HEAD@{101}: commit: bugfix E
3b568bc HEAD@{102}: checkout: moving from feature_branch to master
57b0ea7 HEAD@{103}: commit: blah
152c5b9 HEAD@{104}: checkout: moving from master to feature_branch
3b568bc HEAD@{105}: commit: bugfix D
fe3bbce HEAD@{106}: checkout: moving from feature_branch to master
152c5b9 HEAD@{107}: commit: blah
2318ebc HEAD@{108}: commit: blah
cc5ea32 HEAD@{109}: commit: blah
a5c2303 HEAD@{110}: commit: blah
544a99a HEAD@{111}: commit: blah
299f86a HEAD@{112}: commit: Feature G
fe3bbce HEAD@{113}: checkout: moving from master to feature_branch
fe3bbce HEAD@{114}: commit: Feature C
3852e71 HEAD@{115}: commit: Feature B
5acb457 HEAD@{116}: merge origin/master: Fast-forward
Can anyone make any sense of those 4 cherry-pick
s in a row? I suspect that he didn't really do the git cherry-pick master~3
thing, specially not the ~3
part (which admittedly threw me off when I first saw it too).
Upvotes: 6
Views: 8990
Reputation: 38136
The reason why commits A, B and C are lost, that is because this is what the link you shared to your coworker did. Let illustrate by below graphs:
1. Assume the original commit history your coworker did as,
...X---A---B---C---D---E master
2. Move A
, B
and C
to feature
branch. So your coworker created a new feature
branch from master (commit E
) or any of a commit. And rebase with below steps:
git checkout -b feature
git cherry-pick master~5 master~2
...X---A---B---C---D---E master
\
A'---B'---C' feature
3. Modify master
branch by,
git checkout X
git cherry-pick master~2..master
git branch -f master
git checkout master
the commit structure will look like:
...X---D---E master
\
A'---B'---C' feature
So the direct reason is the command git cherry-pick master~2..master
. It will rebase commit D
and E
directly on commit X
, so you can’t find A
, B
and C
on master branch.
Based on the git flog
, it seems these HEAD information is not enough to show what your coworker did. And the feature
branch seems to checkout from commit C
not D
by
3b568bc HEAD@{105}: commit: bugfix D
fe3bbce HEAD@{106}: checkout: moving from feature_branch to master
152c5b9 HEAD@{107}: commit: blah
2318ebc HEAD@{108}: commit: blah
cc5ea32 HEAD@{109}: commit: blah
a5c2303 HEAD@{110}: commit: blah
544a99a HEAD@{111}: commit: blah
299f86a HEAD@{112}: commit: Feature G
fe3bbce HEAD@{113}: checkout: moving from master to feature_branch
fe3bbce HEAD@{114}: commit: Feature C
So the structure should be:
A---B---C---D---E master
\
G---H feature
If you only want to change the commit structure like:
A ---D---E master
\
B---C---G---H feature
You can reset your master
branch and feature
branch as original, and then cherry-pick commits on master
branch, details as:
git checkout master
git reset --hard <original commit id for E>
git checkout feature
git reset --hard <original commit id for H>
git checkout master
git checkout <commit id for A>
git cherry-pick master~4..master~2 #To make the commits as A---D---E (drop B and C)
git branch -f master
git checkout master
Upvotes: 5
Reputation: 8355
You got very long and good answers already. Let me add this:
My personal theory is that git somehow "remembered" that we had undone commits A B C, so when they came from the new feature branch, git decided not to merge them.
Git never "somehow" "remembers" anything about the contents of your repository. Nor does it ever decide to do or not to do anything based on what you did before. It is very clean in that regard. All its commands are just tools to work on the directed acyclic graph that its commits (and on a lower level, all other objects it stores) are building. And to make it even easier, it only ever adds stuff, never changes or deletes anything.
Except for the commits (i.e., author, timestamp, parent commits etc.), trees (i.e. directories), blobs (i.e., binary data) and a few less important things, there are literally no data structures or further management information about your files and such in the repository. A merge commit does not leave any information that is specific to the "merge"; it is simply a commit with multiple parents.
There is certainly no magic, undocumented stuff going on. The repository is very open, you can literally look at everything using git commands, and everything is fully documented (google "git data structures" or "git internals" if you are interested). Even modifying the internal objects is quite easy if you so wish.
There is one little bit where bit keeps historic information around, this is the so-called "rerere cache" which stores previous conflict resolutions and thus can indeed change the behaviour of future merges. Very handy indeed, but not enabled by default, and certainly not related to the topic at hand.
EDIT: sorry if this explanation sounds too much like "magical thinking", but I'm at a loss. I welcome any attempt to put this explanation in more technical terms, if it's right
Trust the source, luke. It is great that you are trying to get your head around git, and a strong belief that everything is plain and non-magical should help, hopefully.
Upvotes: 0
Reputation: 490108
Let's concentrate on the merge result, but start with a quick skim over this part (I've redrawn the graph a bit):
To get back to my previous (linked) question, we had a tree like this:
A--B--C--D--E--F <-- master \ G--H <-- feature
And we wanted to move B and C to the new feature branch.
The result should have looked like this (with the tick-marks indicating that the commits you have now are copies, not the originals, so their hash IDs have changed, so everyone who got the originals has to scramble to make sure they use the new copies too). But I'll just assume that it did in fact look like this:
A--D'-E'-F' <-- master
\
B'-C'-G'-H' <-- feature
(note that the only commit not copied-and-switched-to is A
!).
When you now run:
git checkout master
git merge feature
Git will do these things in this order:
git rev-parse HEAD
).feature
(git rev-parse feature
).git diff D' F'
: diff the merge base with the tip of master
. This is "what we changed on master
since the merge base": a big list of files (and their hash ID versions), along with any computed rename information and the like.git diff D' H'
: diff the merge base with the tip of feature
. This is "what they changed on feature
", in the same way as in step 4. I use the word "we" for step 4, and "they" here in step 5, because we can use git checkout --ours
and git checkout --theirs
to extract particular files during a merge conflict: --ours
refers to files in commit F'
, i.e., what "we" changed, and --theirs
refers to files in commit H'
.Attempt to combine the differences to get a single changeset.
If Git is able to do all this combining on its own, it declares victory, applies this single changeset to the base commit D'
, and makes a new commit—let's call this M
for merge—in the usual way (so that master
moves to point to M
), except that M
has two parents:
A--D'-E'-F'-----M <-- master
\ /
B'-C'-G'-H' <-- feature
If the automatic merge fails, however, Git throws up its metaphorical hands and leaves you a mess that you must clean up yourself. We'll go into this in a moment.
Note that there are three inputs to this three-way merge:
--ours
, HEAD
) tip commit--theirs
) tip commitThe merge base works here because it is a—in fact, the best—common starting point from which the two commits have diverged. Git is able to go straight for the two branch tips because each commit is a complete snapshot:1 it never has to look at all the intermediate commits, except in terms of the graph so as to find the merge base.
We're also deliberately glossing over a bunch of subtle technical issues, such as pair-breaking and rename-finding (see footnote 1), and things like merge strategies (-s ours
means we don't even look at theirs) and strategy options (-X ours
or -X theirs
). But as long as you are just running git merge feature
and there are few or no renames to worry about, that's not a problem.
But—this is one of the key items—in order to figure out what Git is going to do, you must draw the graph, or otherwise identify the merge base. Once you have the hash ID for the merge base commit, you can (if you want to) git diff
the merge base against the two tip commits and see what Git will do. But if the merge base is not the commit you are expecting it to be, the merge will not do what you expect it to do.
1Compare with Mercurial, where each commit is stored, more or less, as a delta or changeset from its parent commit. You might think, then, that Mercurial must start at the merge base and march forward through each commit along each branch chain. But there are two things to note here: first, Mercurial may well have to start before the merge base, because that too could be a changeset from an earlier commit. Second, suppose that along the chain to either tip, some change is made, then backed out. When Mercurial goes to combine the final changesets to implement the same merge as Git, the commit and its backing-out reversion have no effect on the final result. So in that sense, none of the intermediate commits matter after all! We need them only to reconstruct the two final changesets that are to be combined.
In fact, though, Mercurial doesn't do any of this, because each file in Mercurial is occasionally stored anew, fully intact, so that Mercurial won't have to follow extremely long changeset chains to reconstruct a file. Hence what Mercurial does is effectively the same as what Git does: it just extracts the base commit, and then extracts the two tip commits, and does the two diffs.
There's one big technical difference here, which is that Mercurial does not have to guess about renames: the intermediate commits, which—again just like Git—it must traverse to find the merge base, each record any renames with respect to their parent commit, so Mercurial can be certain what the original name of each file was, and what its new name in either tip may be. Git does not record renames: it simply guesses that if path dir/file.txt
appears in the merge base, but not in one or both tip commits, perhaps dir/file.txt
was renamed in one or both tip commits. If tip commit #1 has other/new.txt
that is not in the merge base, that's a candidate file for a rename.
In some cases, Git can't find renames this way. There are additional control knobs. There is one to break pairings if files have changed "too much", i.e., to have Git say that just because dir/file.txt
is in both base and tip, that it may not actually be the same file. There is another to set the threshold at which Git declares a file to match, for rename-detection purposes. Last, there is a maximum pairing queue size, configurable as diff.renameLimit
and merge.renameLimit
. The default merge pairing queue size is larger than the default diff pairing queue size (currently 400 vs 1000, ever since Git version 1.7.5).
When Git declares a "merge conflict" it stops in the middle of step 6. It does not make new merge commit M
. Instead, it leaves you a mess, stored in two places:
The work-tree has its best guess at what it could do as an automated merge, plus all the conflicting merges written out with conflict markers. If file.txt
has a conflict—a place where Git was unable to merge "what we did" with "what they did"—it might have a few lines that look like this:
<<<<<<< HEAD
stuff from the HEAD commit
=======
stuff from the other commit (H' in our case)
>>>>>>> feature
If you set merge.conflictStyle
to diff3
(I recommend this setting; see also Should diff3 be default conflictstyle on git?), the above is modified to include what's in the merge base (commit D'
in our case), i.e., what text was there before both "we" and "they" changed it:
<<<<<<< HEAD
stuff from the HEAD commit
||||||| merged common ancestors
this is what was there before the two
changes in our HEAD commit and our other commit
=======
stuff from the other commit (H' in our case)
>>>>>>> feature
Meanwhile, the index—the place where you build the next commit to make—has up to three entries per "slot" for each conflicted file. In this case, for file.txt
, there are three versions of file.txt
, which are numbered:
:1:file.txt
: this is a copy of file.txt
as it appears in the merge base.:2:file.txt
: this is a copy of file.txt
as it appears in our (HEAD) commit.:3:file.txt
: this is a copy of file.txt
as it appears in their (tip of feature
) commit.Now, just because there is a conflict in file.txt
does not mean there were not some other changes that Git was able to resolve on its own. Suppose, for instance, that the merge base version reads:
this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
la la la, banana fana fo fana
here is something else
to change with conflict:
this is what was there before the two
changes in our HEAD commit and our other commit
and finally,
here is something to change without conflict:
one potato two potato
In HEAD
, let's make the file read this way, using however many commits we like to get to this point:
this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
a bit from the Name Game
here is something else
to change with conflict:
stuff from our HEAD commit
and finally,
here is something to change without conflict:
one potato two potato
(Note that we made two distinct regions of change. By default git diff
will combine them into a single diff hunk as there's only one context line between them, but git merge
will treat them as separate changes.)
In the other (feature
) branch let's make a different set of changes, so that file.txt
reads:
this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
la la la, banana fana fo fana
here is something else
to change with conflict:
stuff from the other commit (H' in our case)
and finally,
here is something to change without conflict:
cut potato and deep fry to make delicious chips
Again, we have made two changes, but only one conflicts.
The work-tree version of the merged file will take each change that does not conflict, so that the file will read, in full:
this is file.txt.
it has a bunch of lines.
we plan to change some of them on one side of the merge.
we plan to change other lines on the other side.
here is something to change without conflict:
a bit from the Name Game
here is something else
to change with conflict:
<<<<<<< HEAD
stuff from the HEAD commit
=======
stuff from the other commit (H' in our case)
>>>>>>> feature
and finally,
here is something to change without conflict:
cut potato and deep fry to make delicious chips
It's your job, as the one doing the merge, to resolve the conflict.
You may choose to do this:
git checkout --ours file.txt
or:
git checkout --theirs file.txt
but either of these simply copies the "ours" or "theirs" index version (from slot 2 or 3) to the work-tree. Whichever one you choose, you will lose the changes from the other branch.
You may hand-edit the file, removing the conflict markers and keeping or modifying some or all of the remaining lines to resolve the conflict.
Or, of course, you can use any of your favorite merge tools to handle the conflict.
In all cases, though, whatever is in your work-tree will be your final product. You should then run:
git add file.txt
to wipe out the stage 1, 2, and 3 entries and copy the work-tree version of the file to the normal stage-zero file.txt
. This tells Git that the merge is now resolved for file.txt
.
You must repeat this for all the remaining unmerged files. In some cases (rename/rename conflicts, rename/delete, delete/modify, and so on) there is a bit more work to do, but it all boils down to making sure that the index has only the final stage-zero entries that you want, and no higher-stage entries. (You can use git ls-files --stage
to see all the entries in all their stages, although git status
does a pretty good job of summarizing the interesting ones. In particular, all files that have stage-zero entries that exactly match the HEAD
commit are extremely boring, and git status
skips right over them. If there are hundreds or thousands of such files, that's very helpful.)
Once you have resolved all the files in the index, you run git commit
. This makes merge commit M
. What's in the commit is whatever is in your index, i.e., whatever you git add
-ed to remove higher stage index entries and insert stage-zero entries.
git checkout
to check out and resolve at the same timeAs noted above, git checkout --ours
or git checkout --theirs
just gets the copy from index slot 2 or 3 and writes it to the work-tree. This does not resolve the index entries: all the slot 1, 2, and 3 unmerged entries are still there. You must git add
the work-tree file back to mark it resolved. As we also noted, this loses any changes from the other tip commit.
If that's what you want, though, there is a short-cut. You can:
git checkout HEAD file.txt
or:
git checkout MERGE_HEAD file.txt
This extracts the version of file.txt
from the HEAD (F'
) or MERGE_HEAD (H'
) commit. In so doing, it writes the contents to stage zero for file.txt
, which wipes out stages 1, 2, and 3. In effect, it gets the --ours
or --theirs
version and git add
s the result, all at once.
Again, this loses any changes from the tip commit.
It's very easy to get these resolving steps wrong. In particular, git checkout --ours
and git checkout --theirs
, and their short-cut versions using HEAD
and MERGE_HEAD
, drop the other side's changes to a file. The only indication that you will have of this is that the merge result is missing some changes. As far as Git is concerned, that's the correct result: you wanted those changes dropped; that's why you set the stage-zero index entry that way before you made the merge commit.
It's also easy to get a surprise merge base, particularly if you try to do a lot of git rebase
or git cherry-pick
work to copy commits around and move branch names to point to the new copies. It's always worth carefully studying the commit DAG. Get help from "A DOG": git log --all --decorate --oneline --graph
, all decorate oneline graph; or use gitk
or some other graphical viewer, to visualize the commit graph. (Instead of --all
you might also consider using the two branch names in question, i.e., DOG rather than just any old A DOG: git log --decorate --oneline --graph master feature
. The resulting graph is likely to be simpler and easier to read. However, if you did a lot of rebasing and cherry-picking, --all
may reveal more. You can even combine this with specific reflog names such as feature@5
, though this gets a bit long-winded and makes for quite messy graphs.)
Upvotes: 4