Richard
Richard

Reputation: 1041

Git revert fails to undo file deletion

With the following git history:

$ git log --oneline --graph --decorate
*   d19c1fb (HEAD -> feature) Merge branch 'master' into feature
|\
| * 0a97b90 (master) remove d.txt
| * dc0227b append b to b.txt and create d.txt
| * a7536e4 Add a to txt (2)
* | e97dc11 append b to b.txt and create d.txt
|/
* 48d6625 Add a to a.txt
* 7ffa8cb Initial commit (a.txt, b.txt, c.txt)

dc0227b is a cherry-pick of e97dc11.

Why when the HEAD is located to feature, git revert 0a97b90 does not work? It is only outputting, nothing to commit, working tree clean.

Some more context: The branch feature added in one commit the file d.txt and modified b.txt. On master I need the modification to b.txt, but I do not want the file d.txt. So I did the following procedure:

  1. Checkout master
  2. Cherry-pick the commit with the interesting modifs
  3. Made a commit to delete the unwanted file -- master is in the correct state
  4. Checkout feature
  5. Merge master into feature
  6. Revert the commit that delete the unwanted file (3.) -- this fails, if it worked, feature would be in a correct state

Edit:

On the branch feature, ls does list the file d.txt. Then I guess the surprising point is that merging master into feature did not actually delete d.txt... The original question changes a bit, but there is still definitely something I am missing here.

Upvotes: 0

Views: 69

Answers (2)

torek
torek

Reputation: 487755

Consider the following history:

          C--D--E   <-- branch1
         /
...--o--B
         \
          F--G--H   <-- branch2

where each letter stands in for a commit. Newer commits are towards the right, so that E comes after D, and so on. (Side note: internally, Git works backwards: it starts from E, then moves back to D, then to C and on to B. Or, if starting at H, Git starts there, moves back to G, then to F, and then to B. This isn't directly relevant here; it's just useful in working with Git in general.)

You can now git checkout branch1; git merge branch2 or git checkout branch2; git merge branch1. Either merge operation does more or less the same thing: the big difference is which branch name gets updated at the end, and one that's a little harder to describe.

In Git, every commit holds a snapshot. That is, commit B—the one that's on both branches, and is the merge base of this upcoming merge operation—has a complete snapshot of all of the files that are in B at all. The same is true for commits C, D, and E, and for commits F, G, and H. The only way to see what changed in some commit is to compare it to some previous commit.

For instance, we can pick commit C out of the pile, and compare it to commit B. If B and C have the same set of files, but one of the files in C has different contents than that same file in B, we must have changed that file in C. So we'll often say that C (or whoever made C) "changed" the file—but in fact, C simply has the file. The change is only observable by comparing against B.

If we compare B vs F, and F has a file that B does not have at all, we might say that whoever made F added this new file. But in fact, F just has files. We only get "added" by comparing to B.

This same idea holds for D and E and G and H. To say *file f1.txt changed in, say, H, we have to pick some other commit first. Then we can compare commit ___ (fill in the blank) to commit H. Which commit should we pick? (I bet you know which one to pick! But you do have to pick one.)

How merge works

Many people expect Git to handle merge by looking at every commit. But it doesn't. Let's say we run:

git checkout branch2

so that we ask Git to start by filling in our index and work-tree from commit H. That way, we can see all the files that are in the snapshot in H. To remember which branch we're on, we'll update our drawing and attach the special name HEAD, in all uppercase like this,1 to the name branch2:

          C--D--E   <-- branch1
         /
...--o--B
         \
          F--G--H   <-- branch2 (HEAD)

In any case, now we'll run git merge branch1. Git will use the name branch1 to find commit E. The name points directly to commit E, so that's easy. Then, Git will use the internal, backwards-pointing arrows connecting these commits (I've drawn them as lines instead of arrows because it's hard to draw good arrows on StackOverflow) to work backwards from both H and E and will find commit B. This commit is the merge base of the merge.

These are now the three inputs to the merge operation:

  • merge base commit B;
  • --ours commit H; and
  • --theirs commit E.

Git does not look at the intermediate commits.2 It simply does two straight comparisons:

  • git diff --find-renames hash-of-B hash-of-H: this tells Git what we changed, ignoring commits F and G entirely, to turn the snapshot in B into the snapshot in H.
  • git diff --find-renames hash-of-B hash-of-E: this tells Git what they changed, ignoring commits C and D entirely, to turn the snapshot in B into the snapshot in E.

Git now combines these two sets of changes into one larger pile of combined changes. Then it extracts the files from B—not from E or H—and applies the combined changes to those files. Whatever comes out of this combined changes, that's the merge result.

If all goes well—if Git is able to combine the B-vs-H changes with the B-vs-E changes on its own—Git now makes a new commit from the result. The new commit has two parents, instead of the usual one. The first parent is the commit we're using right now, i.e., commit H. The second parent is the commit we selected to merge, i.e., E. Git then updates whichever branch we have checked out so that the name points to the new commit.

The result is this:

          C--D--E   <-- branch1
         /       \
...--o--B         I   <-- branch2 (HEAD)
         \       /
          F--G--H

with the first parent of merge commit I being H, and the second parent of merge commit I being E. Merge commit I has a snapshot, just like any commit. It doesn't have changes, just a snapshot.

We can ask Git to compare commit I to some previous commit. Which previous commit do you choose? Remember, you can only choose one previous commit. You can run git diff hash1 hash2 or git diff hash branch2, because the name branch2 selects commit I now. But you pick one hash ID—the hash ID of B, or C, or E or F or whatever you like—and Git compare the snapshot in that commit to the snapshot in merge commit I.

Pick any two commits and compare them, and you'll get a diff. The result of the diff clearly depends on which two commits you pick. When you have an ordinary non-merge commit, there's one obvious commit to pick—but with a merge commit, there are two obvious commits to pick, and you only get one at a time.3


1Frequently, on Windows and MacOS, you can get away with typing head in lowercase. This is something of an accident of the implementation. It generally does not work at all in Linux, and it does not work correctly on these other systems if you start using git worktree add—so try to avoid this habit. If typing HEAD in all caps is annoying, consider using the special symbol @, which Git internally translates into its own special name HEAD.

(I find I typo HEAD all the time as HAED so I probably should use @ myself.)

2Even if it did, you would usually get the same result. The cases where you wouldn't get the same result are interesting, but mostly involve repeated name changes while files evolve over many commits. That's not what happens here.

The case of "add a file, but then delete it again" makes it pretty clear that the add step was a mistake and should be ignored. It is true that there's an "add" on one "leg" of the merge without a delete, and an add-and-delete on the other "leg" of the merge. But that just suggests that the mistake is only on the one side. The other side added and kept the file—so the merge should add and keep the file, and that's what Git ends up doing when it combines the changes.

Nonetheless, going commit-by-commit would at least give Git the ability to see the add-and-delete on the one particular leg. That would enable the algorithm to treat this specially, e.g., by declaring a conflict. But Git doesn't go commit-by-commit, so it can't see this at all!

3Technically, Git can pick all of the parents, producing what Git calls a combined diff. This is very different from the way Git does the merge, though. A combined diff, such as that produced by git show of the merge commit, skips diffing files whenever the merge commit's copy of some file matches any of its parents' copy of that same file. Only if the merged copy is different from all parents will this combined diff show something, and even then, it will omit some of the differences.

What this means is that you often need to run two git diffs to really inspect a merge commit. First, you diff the commit against its first parent, to see what changes came in from the --theirs side of the merge. Then you diff the commit against its second parent, to see what changes came in from the --ours side. The git show command has the -m flag to help do this automatically. A few files will have changes from both sides of the merge, and those are the only files that the combined diff might show at all.

Note that git log -p normally does not even try to show a merge this way. Adding -m, or -c or --cc, will make git log -p show merges, using either the split-into-multiple-diffs method—-m—or the combined diff algorithms, -c and --cc.

There are two different combined diff algorithms. The default for git show is --cc. I always got them mixed up until I used this as a mnemonic: one C = one hyphen; two Cs = two hyphens. What's the precise difference in their output? That, I still can't explain properly. I don't know what the Git authors had in mind here. I use -m when I need diffs I can really use.

Upvotes: 1

jthill
jthill

Reputation: 60245

You don't say what's in d.txt at the feature tip, but from git revert's behavior I think it's a pretty safe bet that d.txt in e97d and d.txt in dc02 are identical. The revert found nothing to do because the revert should make d.txt look like it does in dc02. It already looks like that, so, nothing to commit. Everything still looks exactly as it did on checkout.

edit:

I guess the surprising point is that merging master into feature did not actually delete d.txt

Why should it? d.txt doesn't exist at the merge base and doesn't exist at the master tip, grand total zero effect on d.txt to merge from that history.

Upvotes: 1

Related Questions