Tom Ellis
Tom Ellis

Reputation: 9414

How to see three-way git diff even after conflicts are resolved

If I have a conflict in git (I use rebase but the same presumably holds for merge) it adds conflict markers to my file so I can resolve by editing

  line1
++<<<<<<< HEAD
 +line1a
++=======
+ line1b
++>>>>>>> b


  line2
++<<<<<<< HEAD
 +line2a
++=======
+ line2b
++>>>>>>> b

Part way through the merge git diff still shows the three-way diff

  line1
 +line1a
+ line1b


  line2
++<<<<<<< HEAD
 +line2a
++=======
+ line2b
++>>>>>>> b

but once I resolve all the conflicts and add them then git diff shows nothing. How can I see the three-way diff? Specifically, I want to see something like

  line1
 +line1a
+ line1b


  line2
 +line2a
+ line2b

Upvotes: 4

Views: 1436

Answers (1)

torek
torek

Reputation: 488213

TL;DR

Consider using git checkout -m, but be very careful with this as it is a destructive command. Note that it only sometimes works.

Long

These are in fact different: what you see in a work-tree file during a conflicted merge—from anything that uses Git's merge engine, including the cherry-picking that happens during a rebase—is what is left behind by Git's low-level merge driver, and what you see from running git diff is produced by Git's combined diff code.

The first kind of output—which does not have a formal name—can be reproduced at any time as long as you have all three input files available. The second kind of output, the combined diff, is ... trickier.

The low level merge driver itself is available as a separate program, git merge-file.

How can I see the [combined] diff?

Unfortunately, there's no tool to do this in the case of resolved files. You can get what you want, sort of, but it's tricky:

  • If you have not finished a rebase or cherry-pick operation (or a revert, which also does this), you can destroy your resolution, putting the files back in conflicted state. To do this, use git checkout -m on the file in question, but note that it destroys the work you have done so far:

    git checkout -m -- path/to/file.ext
    

    (You can save the previously hand-merged file somewhere else—just move it out of the way, for instance—as you'll get the entire conflicted state back. Put the merged file back when you're ready and use git add as before, to mark it resolved again.)

  • If you have finished a rebase or similar, you would have to repeat the particular operation involved, to cause the conflict again.

  • Merges are a bit different, as we'll see in a moment.

In Git, conflicts arise when doing a "three-way merge". A three-way merge implies three input files. When you use a plain git merge, the sources for these three files are easier to see, so let's consider that case before we get to rebase and cherry-pick. Here's even more background you need to know first, though, to understand what is going on here.

What to know about Git's index

We will start with a series of commits that start from some common shared history, like this:

...--G--H   <-- master

We will now make two new branch names, branch1 and branch2, both pointing to the existing commit whose hash is H:

...--G--H   <-- master, branch1, branch2

so that all commits are on all branches. Then, on each of these two new branches, we make some new commits. It doesn't matter how many as long as there is at least one on each branch; I'll draw two on each here, once we get there.

Something to know about commits is that each one holds a snapshot of all of your files, in a special, read-only, Git-only, compressed format. This freezes a copy of the files for all time, so that Git can get them back later, from any commit, any time you like. The frozen copy can only be used by Git, though, so an un-frozen, ordinary copy needs to go somewhere else. You tell git checkout which commit you want, and it extracts the files, turning them back into ordinary and useful files, putting the useful copies into your work area, which Git calls your working tree or work-tree.

If you git checkout a commit by its hash ID, Git will extract all of that commit's frozen files to your work-tree, so you can see and use this historic version. That's not quite the way you normally work with Git, though.

The thing to know about new commits is that Git makes them from Git's index, not from your work-tree. That is: we use git checkout to select a branch name, which in turn selects the last commit contained in that branch. We now have a current name—Git attaches the special name HEAD to one of the branch names—and a current commit. Git copies each committed file out of the commit, into your work-tree ... but it also copies each committed file into Git's index.

In other words, the index holds a copy of each file from the current commit.1 This copy seems pointless at first: there's one in your work-tree. Why not use that one? Other version control systems do in fact do this, but Git doesn't. Exactly why, well, that's up to the Git authors, but we can note this: the index copy is in the frozen format. This means there is no need to re-compress the work-tree copy again. The git add command can take an updated work-tree copy and compress it and now the index copy is updated and ready to commit. When you run git commit, the index copy of each file is the one that goes into the new commit.

We can therefore say that the index holds your proposed next commit. It's going to get a bit more complicated in a moment, but for now, let's git checkout branch and make a single new commit. We'll start with this:

...--G--H   <-- master, branch1 (HEAD), branch2

The current branch is branch1. The current commit is H (which stands in for some actual hash ID). Both Git's index and your work-tree are filled-in with the snapshot from commit H.

You now change some work-tree files and git add and run git commit. Git collects up the appropriate metadata from you—your name and email address, your log message, and so on—and sets up the new commit to have commit H as its parent. Git packages up the frozen-format files in the index to make the new snapshot. Git writes all of this out, which acquires a new unique hash ID which we'll call I, with I set up to point back to existing commit H—the one we have out as we're working—which gives us:

          I
         /
...--G--H   <-- master, branch1 (HEAD), branch2

and now the magic step happens: Git writes the new commit's hash ID into the current name, so that branch1 now points to I:

          I   <-- branch1 (HEAD)
         /
...--G--H   <-- master, branch2

So branches grow one commit at a time when we use git checkout to get them out, modify work-tree files, use git add to copy the updated files back into the index to be ready for snapshotting, and then run git commit to make the snapshot. The new snapshot points back to the one that was current—was HEAD—and now the new one is current. The new one was just made from the index, so the index and commit match, just like they did when we cleanly checked out commit H earlier, and we're ready to modify and commit some more.


1Technically, the index contains a reference to an internal Git blob hash ID, rather than an actual copy of a file. But unless you start poking around inside the index details—as we will in a moment—you can't really tell the difference between this and having a whole copy of the file.


Merging, normal-style

So, let's say we made two commits on each branch, and have branch1 out right now, like this:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

(the name master still points to H but I'll be lazy and stop drawing it now). We now run git merge branch2.

Git automatically finds the best common merge base commit—the shared commit from which both branches descend—which in this case is obviously commit H. Each of these three commits has a full snapshot of all of our files. So here's what Git does, at least in principle (in practice this is all rather optimized):

  • First, Git expands the index. Instead of holding one copy of each file, it now holds up to three copies of each file. These copies are numbered and called staging slots.

  • A copy of each file in the merge base, commit H, goes into slot 1.

  • A copy of each file from the current commit J goes into slot 2. In practice there's already a copy in slot zero—the normal, not-conflicted all-resolved slot—so Git can just move it over one step. There are some complicated cases here that you don't normally see yourself, if your index and/or work-tree are dirty, because the git merge command won't let you start if your index and/or work-tree are dirty.2

  • A copy of each file from the other commit, L here, goes into slot 3.

There are now three copies of each file, at least for each file that's in all three commits, which is the interesting case here.

The merge command now compares the three copies. If all three are the same—which for many merges holds for almost all files—the result is trivial: any copy will do. Git will move that to slot zero, erasing the remaining three slots. That file is now resolved. The work-tree copy is already fine too, so Git leaves it alone.

If the merge base copy matches their copy—slot 1 = slot 3—but ours doesn't, then we must have modified the file. The right merge result is take our file so Git moves the slot-2 copy to slot-zero, erasing the other two slots, and leaving the work-tree file alone again. The file is resolved: we used ours.

The the merge base copy matches our copy—slot 1 = slot 2—but theirs doesn't, then they must have modified the file. The right merge result is take their file so Git moves the slot-3 copy to slot-zero and this time also extracts the slot-3 copy to the work-tree. The file is resolved: we used theirs.

Only for the all-three-slots-differ case does Git have to do any real work. Git now invokes its low-level, single-file merge driver on the three files.

The low-level driver writes the work-tree copy of the file as its output. It also looks at each actual source-line change, i.e., what we'd see if we ran git diff. It compares the merge base (slot 1) copy of the file to our copy (slot 2) to see what we changed, and compares the merge base to theirs (slot 1 vs slot 3) to see what they changed. Where the changes don't overlap or abut (touch), the standard low level merge driver replaces the slot-1 lines with the other-slot lines. Where the changes do overlap or abut, the standard low-level merge driver writes a merge conflict into the work-tree copy of the file.

Having handled all lines, the low-level driver reports back: It either says all changes combined successfully, or it says merge conflict. That one piece of information determines what the higher level code does in the end. If it says combined successfully, the resulting file goes into slot zero and the file is considered merged. If it says merge conflict, Git leaves all three files in the index.

The higher level code handles all files, using the low level merge driver on each potentially-conflicted file, one at a time. When this is all done, if any of them had merge conflicts, the merge as a whole stops. This is where your work–and your question—comes in. You must come up with the right file.

The git add command will copy whatever you have in your work-tree file into slot zero and erase the other three slots. So having updated the work-tree file, you run git add on it, and that marks the file resolved.

Once you've resolved all conflicts, you run git merge --continue or git commit to tell Git to finish the job. Git uses the files that are all in slot zero now to make a new commit. It therefore has a snapshot from the index, as usual. The only thing that is special about the new merge commit is that it has not just the usual one parent, but two:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L   <-- branch2

The first parent of the merge is the same commit it would always be—commit J, in this case—and the second parent is the other commit: in this case, L.


2Dirty here means the copy of some file in the index and/or work-tree does not match the HEAD-commit copy of the file. As long as all three copies do match, so that the git status command says nothing to commit, working tree clean, it doesn't matter where this slot-2 copy comes from: all three match.


Cherry-picking is merging

Let's look at a simpler series of commits. Rather than two branches that we want to merge, let's suppose we just have this:

        tag:v1.0
           |
           v
...--E--F--G   <-- release/1
            \
             H--I--J   <-- develop (HEAD)

We've made some actual release of the software, with commit G being the release 1.0 version (both tagged and branched). We've gone on and started to add new features in the development branch and made new commits H-I-J. Now we realize: hey, in commit J, the only change we made was to fixed a nasty bug that's there in commit G too (perhaps introduced back in commit E or F, so it's there in G and H and I).

We'd like to update our release to v1.1 with the fix we put in from J. That is, we want to copy commit J to a new commit that's like J—that fixes the bug—but that comes after G.3 We will call this new commit J':

        tag:v1.0
           |
           v
...--E--F--G--J'  <-- release/1
            \
             H--I--J   <-- develop

(Once this is all done, we'll tag commit J' as v1.1 and re-release.)

So, we run:

git checkout release/1
git cherry-pick develop

The way cherry-pick itself works is simple:

  • Assume each commit has one parent commit. In this case, J has one parent, I.
  • Treat the current commit—which will be G after git checkout—as the slot-2 commit.
  • Treat the parent as the merge base, and the commit itself as the other—or slot 3—commit.

So Git will now diff files in I vs those in G to see what we changed, i.e., to go backwards from I to G, backing out what we did in H. It will diff files in I vs those in J to see what they changed, to fix the bug. Then it will combine our changes with their changes as usual.

Any merge conflicts that occur happen where the backing-out of the development work conflicts with the bug-fixing. That's in fact just what we want: we want to make sure that we take whatever is required for the bug fix.

Once all conflicts are resolved, Git makes the new commit as an ordinary, single-parent commit, rather than as a merge commit. Its single parent is the commit that was HEAD before, and the new commit is now HEAD as usual.


3It may actually be best to find the original commit that introduced the bug, and make a branch there and fix it in the branch. We can then merge this fix into every release, instead of cherry-picking. The difference is irrelevant in the illustration above—in fact, the cherry-pick is easier and simpler—but over time, the difference eventually matters in terms of release management. See Raymond Chen's series about this.


Rebase itself is mainly a series of cherry-pick operations

If we start with:

...--G--H   <-- master
         \
          I--J   <-- feature (HEAD)

and someone adds some master commits so that we have:

...--G--H--K--L   <-- master
         \
          I--J   <-- feature (HEAD)

we might like to copy I to a new and improved I', then copy J to a new and improved J', to get:

                I'-J'  <-- HEAD (detached HEAD)
               /
...--G--H--K--L   <-- master
         \
          I--J   <-- feature

Once that's done, we'd like to have Git peel the name feature off commit J and make it point to commit J' instead, and re-attach HEAD:

                I'-J'  <-- feature (HEAD)
               /
...--G--H--K--L   <-- master
         \
          I--J   [abandoned]

The copying from I to I', and from J to J', is just what git cherry-pick does. So rebase can:

  • list out the commits to be copied, in the right order (I, then J);4
  • detach HEAD by checking out the target commit L by hash ID, equivalent to git checkout --detach, and historically, one kind of rebase literally ran that command;
  • run two git cherry-pick commands;5 and
  • forcibly move the branch and re-attach HEAD.6

(I won't get into how the new-ish --rebase-merges works, which complicates this a lot.)


4Obtaining this list, of the right commits to copy, is actually quite complicated. We won't go into details here.

5Some rebase operations literally do this, one at a time: interactive rebase in particular turns each pick command into a separate git cherry-pick step. Others try to be more efficient and/or are a little different internally, especially the old style internal git-rebase--am back-end. Git 2.26 is finally moving away from using this old-style rebase as a default, as it misses some rename cases.

6This last step is something you can do manually with git checkout -B or git switch -C, if for some reason you want to do all four steps manually.


Finally, back to your original question

How can I see the three-way diff?

Obviously we need three inputs: a merge base version, and two other versions. Let's say that the file's name is F here.

If you've just started anything that uses Git's merge engine, and are in the middle of a conflicted merge, the three inputs are in Git's index right now. That's where Git's low level merge driver got them. It's written its own attempt at merging into the work-tree file, and you can see that by looking at it.

Or, you can run git diff now. That git diff notices that for file F, there are three index copies. It diffs the three and combines the diff into a combined diff.7

You can name these index copies to certain Git commands using :1:F, :2:F, and :3:F. One of the most useful Git commands here is git show:

git show :1:path/to/file > file.BASE
git show :2:path/to/file > file.OURS
git show :3:path/to/file > file.THEIRS

for instance. Now you have three ordinary files and can do the same sort of thing—or run git merge-file on them, if you like.

If you've run git add path/to/file, though, Git has wiped out the three higher-numbered copies, replacing them with a single slot-zero copy. You can see with git show using the name :path/to/file or :0:path/to/file, but it's really just the one sitting in your work-tree already, so why bother?

If you want, you can Git to reconstruct the merge conflict:

 git checkout -m -- path/to/file

Git puts the three copies back into the three slots and re-runs the merge driver, overwriting the work-tree copy.8

To get git diff to give you the combined diff at this point, you must put the three copies in the index. If you really want, there is a way to upload arbitrary file content into the index, at any staging slot number, using git update-index, but it's tricky: you have to turn them into Git blob objects first and get their hash IDs. I don't recommend doing this as it is hard to get right:

git hash-object -w -t blob --stdin < contents

produces the appropriate blob hash, after which git update-index --index-info can read lines from stdin to put things into index slots. The format of the stdin stream given to git update-index --index-info is quite rigid and meant only for other programs to use. (Note that --cacheinfo, which is easier to use, doesn't let you write to nonzero slot numbers.)

Once you commit the merge result—as a merge, or a cherry-picked commit, or whatever—all of the git checkout -m data are gone and you cannot reconstruct the merge state this way. A merge commit, however, records both of its parent commits, and running git show on a merge commit invokes the combined-diff code.

There is a large caveat here: git show on a merge commit defaults to the --cc (two-dash, two-c) style combined diff. This is different from the output from git diff during the conflicted merge while the conflicts are in the index's nonzero staging slots. Using git show -c forces Git to use the -c one-dash one-c style, which is closer to (but still not the same as) the output from git diff during the conflicted merge.


7This isn't quite right, because as you modify the work-tree copy, you will see that the output from git diff changes. Git knows that these are not what we care about: we really want to see slot-2-vs-work-tree and slot-3-vs-work-tree. So that's what it's diffing and combining here.

8You can do this git checkout -m without first git add-ing the file to mark it as resolved. In this case, the three slots are already full and ready to go. The work-tree copy still gets clobbered, though, and this is probably the most important part here.


Related work

This is not the same thing at all, but you may be interested in interdiffs and range diffs. See What does interdiff do that diff cannot? and How do I get the interdiff between these two git commits? for more information.

Upvotes: 6

Related Questions