Cemal Okten
Cemal Okten

Reputation: 828

GIT rebase says that file was deleted in HEAD but it has only been updated

When completing a rebase on a branch git rebase branch master the file I have been editing on the branch no longer has my changes and the log states:

CONFLICT (modify/delete): client/src/pages/Section/index.js deleted in HEAD 
and modified in 279afb1 (create content pages). Version 279afb1 (create content pages) 
of client/src/pages/Section/index.js left in tree.

The file in question (client/src/pages/Section/index.js) is present in my branch and has not been deleted.

Any suggestions on how I can resolve this?

Upvotes: 0

Views: 1319

Answers (1)

torek
torek

Reputation: 489083

Files are not "in branches": specific files are in, or absent from, specific commits. Those commits are then contained in branches, but any given commit can be in many branches at the same time, or even in no branches at all. So don't think of this as "the file isn't in the branch": think of this as "the file isn't in the commit", because that's the case here.

Now, the commit in question is HEAD, and HEAD is tricky during rebase. This is because Git forces you to be a Git Mechanic, as if instead of just hopping into a taxicab and saying "take me to Paris", you have to first assemble the taxi from Lego parts, then assemble the driver, and then direct him to each turn, one step at a time. 😀

To "get" git rebase, we must start with the fact that each commit, in Git, holds a full snapshot of every file. This might seem like a terrible waste of disk space, and it would be if Git commits stored files the way your computer normally stores normal files. But in fact, Git commits store files in a special, weird, Git-ized format, with the contents compressed and, crucially, de-duplicated, so there's no actual wasted storage after all.

Each commit is numbered, with a big ugly hash random-looking ID. These hash IDs are quite impractical for humans, so Git tends to abbreviate them, e.g., as 279afb1. Git just takes the front 7 or so "digits"—these are in fact hexadecimal digits—and displays those, dropping the rest if that's safe to do, which it often is. Each hash ID is unique: no two Git commits are ever allowed to have the same one, even if they're in different repositories.1

Besides storing a full snapshot of all files, each commit also stores some metadata, or information about this one particular commit. That includes things like the name and email address of the person who made the commit, and some date-and-time stamps and so on. Crucially for Git itself, each commit's metadata stores a list of the hash IDs of some set of previous commits. Most commits store exactly one hash ID in this list; we call that the parent of the commit in question.

Because commits remember the hash IDs of their parents, and most commits remember exactly one such hash ID, when we have those commits in some branch, they form a backwards-looking chain. We say that the child commit points to its parent commit, and we can draw these commits like this:

... <-F <-G <-H

Here H stands for the Hash ID of the latest (newest) commit in the chain. That commit that holds a permanently archived full snapshot of every file. It also remembers, in its metadata, the hash ID of earlier commit G, its parent.

Commit G is of course a commit as well, so it has a full snapshot of every file, and metadata that holds a commit hash ID: its parent, F. Like G and H, F is a commit, so it holds a full snapshot of every file, and metadata holding another parent hash ID.

Git can follow this backwards chain, one hop at a time, from the end of the branch, all the way back to the very first commit ever. So Git only needs to know the hash ID of commit H, the end of the chain. That's the hash ID you have to memorize—oh wait, humans are bad at hash IDs!

To avoid forcing you to actually memorize the hash ID (or build the taxi from Lego bricks, as it were), Git provides branch names. A branch name simply holds hash ID of that last commit in the chain. We say that the branch name points to the tip commit, with tip commit being a Git term: it's the commit to which the branch name points. So we really have:

...--F--G--H   <-- somebranch

Now, no part of any commit can ever be changed. That's a little math trick Git pulls so that it can generate its commit hash IDs. So not only is the archive-of-all-files permanent, so is the metadata. Commit H will always and forever point back to commit G, which will always point back to F, and so on.

The branch name pointers, however, aren't frozen like this. The name somebranch can be made to point to any commit that's in the Git repository. Suppose, for instance, we make a new commit I, that points back to H:

...--F--G--H   <-- somebranch
            \
             I

We can now wheel the arrow from somebranch around a bit so that it points to I, which is easiest to draw like this on StackOverflow:

...--F--G--H
            \
             I   <-- somebranch

Commit I is now the tip commit of branch somebranch.

This is how branches grow, in Git.


1This is mathematically impossible, so it's not actually completely true. If two different commits ever get the same hash ID, it becomes impossible to introduce the two repositories to each other. Doing so doesn't destroy the universe like some Star Trek TOS episode, it just makes Git stop working with those two repositories, so we don't do that. But Git can't know whether two commits will come together someday or not, so it tries its gosh-darned-est to make sure that the hash IDs are truly unique, and in practice, this works fine.

See also How does the newly found SHA-1 collision affect Git?


What if there's something bad about a commit?

Let's say we've made commit I and now have:

...--G--H--I   <-- somebranch

Upon looking again, we discover a horrible mistake: we made a typo in the commit message, or forgot to update the README file, or something. Oh no! Disaster! Commit I is awful and needs to be fixed! But alas, no commit, once made, can ever be changed.

Fortunately, there's a trick we can play. Let's make a new-and-improved commit I', but use git commit --amend when we make it, so that I''s parent is not I, but rather is H again, like this:

...--G--H--I   <-- somebranch
         \
          I'

Now once again we'll have the name somebranch point to I' instead of I:

          I   [abandoned]
         /
...--G--H--I'  <-- somebranch

Because we (humans) use the branch name to find the last commit, we won't find commit I any more. We'll find a different commit, with a different big ugly hash ID, I'. Then we'll have Git follow its internal backwards-pointing arrow to H, and then to G and so on. It now looks like we changed a commit.

We didn't: that's impossible. But we got something just as good, or almost as good, or sometimes better: a new and improved commit.

Rebase is about making new and improved commits

This is what the git rebase command is for. In our case, we'll have some set of commits and two or more branch names that help us find those commits:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2

Here we have three branch names: main selects commit H, br1 selects commit J, and br2 selects commit L.

If we work backwards from J, we find commits J, I, H, G, and so on. These are the commits that are "on" or "in" branch br1 at the moment. (Because branch name arrows are moveable, this could change.)

If we work backwards from L, we find L, K, H, G, and so on: these are the commits on br2. It doesn't matter that H and G and so on are on another branch too. They're on both branches—or in fact, on all three branches, because the name main finds H too, and Git works backwards from there.

The branch names don't matter (except to us humans): they are just there to help us (and Git) find the commits. It's the commits that matter. As long as we have enough names to find all the commits, we can delete any extra names, if we don't need a quick way to find those commits. So if we don't care about finding H quickly, we can delete the name main safely, because we have two names to find H, albeit a little slowly. We just don't want to delete the last name that finds some commit(s) until we're ready to abandon them.

For now, we'll knock out the name main because it clutters up the diagram. We can always put it back later, provided we can find the right commit hash ID:

git branch main <hash-of-H>

will create the name again, using the hash ID (which we can cut and paste with the mouse after running git log and finding it). We wouldn't do this normally, except for illustration, of course, as deleting and re-creating branch names is annoying and error-prone (you have to get the hash ID right).

But now that main is gone, we have this:

...--G--H--I--J   <-- br1
         \
          K--L   <-- br2

Why did I put br1 back on the line? Mostly, just because I felt like it: it doesn't really matter how you draw the graph as long as you have the right backwards-pointing links. (I've also given up on most of the arrows in favor of -- etc., because they're hard to draw on StackOverflow: there are some arrow fonts, but they don't come out right on all computers, and they're ugly.)

Since we have two branch names, it becomes tricky to remember which name we're using. So now we'll add, to our drawing, the idea of HEAD as the current branch. We'll put the word HEAD (in all caps like this) in parentheses after one branch name, to show that this is the branch name we're using:

...--G--H--I--J   <-- br1
         \
          K--L   <-- br2 (HEAD)

This means we have commit L "checked out", via having branch br2 "checked out": we used git switch br2 or git checkout br2 to get here.

The current set of files in our working tree are those from commit L, if we haven't changed them. If we do want to change them, we do that, and git add and git commit and we get a new commit and name br2 gets updated to point to the new commit. But we won't do that right now.

Instead, what we'll do is say: Hey, you know, commits K and L are pretty OK. But they'd be better if they came after commit J, instead of extending from commit H.

What we'd like to do, in other words, is copy (the changes in) commit K to some new-and-improved commit K', but make K' work like this:

                K'
               /
...--G--H--I--J
         \
          K--L

Then we'd like to do the same thing for commit L: make the snapshot into changes, and copy those changes into a new commit L' that adds on where K' stops:

                K'-L'
               /
...--G--H--I--J
         \
          K--L

I've taken the names—and the special name HEAD—away here since I'm just drawing where we want to end up (in Paris?). We haven't get used Google Maps to generate our turn-by-turn directions, but that's our next step.

Copying a commit: git cherry-pick

Now, above, I talked about the changes in a commit. But commits don't have changes. They have a snapshot and some metadata.

The trick here is that the metadata in an ordinary commit, like all the ones drawn above, includes a parent hash ID, and each parent commit also has a snapshot. If we have Git place the two snapshots side-by-side, and play a game of Spot the Difference with them, we'll get a listing of what changed.

So that's just what we do. We have Git go get the parent of commit K, which is commit H, and git diff those two commits, to see what changed. That tells us what we did in commit K.

But now we have a problem. We can't just re-do that exactly as it is, because the files in commit J don't necessarily match up exactly with the files in commit H! What if we added a few lines to one of the files? What if we deleted a few lines?

Well, without getting into all the details—which means this is a bit bass-ackwards, as it's better to introduce git merge first—the way Git resolves this is by doing a "pretend merge". A merge, in Git, has three inputs:

  • there's a merge base commit, which we assume is the common starting point;
  • there's an "ours" or HEAD commit, which is the current commit, currently checked out; and
  • there's a "theirs" commit, which is some other commit, which we're going to merge.

The git cherry-pick code uses this merge machinery, which Git already has for git merge, but when it does so, it plays a little trick: the merge base commit is forced to the parent of the commit to be cherry-picked.

Rebase is repeated cherry-picking, with a bit of setup and cleanup

So, to make this all work, here's what Git does:

  1. Git starts by doing a detached HEAD check-out of the commit where we want to build the new commits. That's commit J. The result looks like this:

    ...--F--G--H--I--J   <-- HEAD, br1
                \
                 K--L   <-- br2
    

    The two branch names still point to their tip commits, but now HEAD isn't attached to any branch name. We're now in this special "not on any branch" mode that Git uses for git rebase.

  2. Git runs git cherry-pick hash-of-K. Git has already (in a step zero that I didn't cover for space reasons) listed out the hash IDs of the commits it is supposed to copy. This step is, for no great reasons, actually horribly complicated now. It was pretty simple once, and for our case it works in the simple way and just lists out the two commits that are only on branch br2, but for hysterical reasons it's no longer simple.

  3. If that goes well, Git makes the new commit, K', on its own, re-using the commit message from commit K:

                       K'  <-- HEAD
                      /
    ...--F--G--H--I--J   <-- br1
                \
                 K--L   <-- br2
    

    If things don't go so well here, you get a merge conflict. And that's what you got—we'll come back to that. But for the moment, we'll assume that this goes well.

  4. Git repeats this for every commit that needs to be copied. In this case that's just one more commit, L, making L':

                       K'-L'  <-- HEAD
                      /
    ...--F--G--H--I--J   <-- br1
                \
                 K--L   <-- br2
    
  5. Finally, when everything is done correctly, Git yanks the name br2 over to point to the last copied commit, and re-attaches HEAD:

                       K'-L'  <-- br2 (HEAD)
                      /
    ...--F--G--H--I--J   <-- br1
                \
                 K--L   [abandoned]
    

That's how rebase works, or should work if all goes well.

Things go wrong

You got:

CONFLICT (modify/delete): client/src/pages/Section/index.js deleted in HEAD 
and modified in 279afb1 (create content pages). Version 279afb1 (create content pages) 
of client/src/pages/Section/index.js left in tree.

This usually happens, and in your case did happen, at the first cherry-pick step. So you're here:

...--F--G--H--I--J   <-- HEAD, br1 (the branch you're rebasing onto)
            \
             K--L   <-- br2 (the branch you're rebasing)

The HEAD commit isn't your commit at all! It's their commit, J. The commit that Git calls theirs is your commit K. The merge base, from which Git decided that "you" (i.e., they, in commit J) deleted client/src/pages/Section/index.js, is commit H: the common starting point where you and they diverged. They deleted the file in one of their commits—maybe in commit I, not J after all, but it any case it's gone, so that git diff says that the file is deleted.

You, meanwhile, changed the file a bit.

Your job, as the programmer who actually understands the source code—Git just thinks of it as text lines that should be combined line-by-line, without regard to what's a comment, what's a string, what's good and what's not—is to figure out how to combine your work with their work. Perhaps you need to modify some lines in some other file, or add lines to some other file, or put client/src/pages/Section/index.js back with just a few lines in it.

Once you figure out what the right content is, you will have to git add this file, if it should exist, or git rm it if it should not.2 You should check all the files that Git thinks it successfully merged, because some of them could well be wrong. If they're wrong, you need to edit them until they're right, and git add them as usual.

Once you've resolved the conflict, you need to run:

git rebase --continue

to have Git go ahead and finish the cherry-pick operation to create commit K'. Git will probably go on to create L' on its own, unless the resolution you chose requires more work on your part (e.g., perhaps there are more changes you made to that same file that now need to be made to some other file instead).

Once all the cherry-picking has finished, Git will put you back on your branch, and HEAD will be sensible again. Until then, you're in this rebase limbo: you can't go anywhere unless you either tell Git to abort the entire rebase, going back to where you were before you started, or you finish the rebase.

Since no existing commit can ever be changed, if you do decide to abort the rebase, you're just back to where you were before you started the whole thing. And, if you do finish the rebase and decide that you hate the result, it is possible—albeit a little tricky—to find the hash ID of the original L commit and get it all back, at least if you do that within about 30 days. After a long enough time, Git starts to think that maybe it should toss abandon commits into the rubbish bin.3


2git rm will complain because the files is not in the index at slot zero, but it will work. An alternative is to remove the file from the working tree and use git add, which doesn't complain, but feels weird to me, so I always use git rm and ignore the gripe. It's your choice: all Git needs to know is what the final result should be: is there a file, or isn't there? If there is a file, what are its contents?

3This cleanup process can be delayed a long time, and on some hosting systems like GitHub, it's never done, for reasons that are known mostly to the GitHub folks. You should neither assume that these old commits will be cleaned up, nor that they won't be, but you do get a minimum of 30 days by default to change your mind.

Upvotes: 0

Related Questions