Reputation: 828
When completing a rebase
on a branch git rebase branch master
the file I have been editing on the branch no longer has my changes and the log states:
CONFLICT (modify/delete): client/src/pages/Section/index.js deleted in HEAD
and modified in 279afb1 (create content pages). Version 279afb1 (create content pages)
of client/src/pages/Section/index.js left in tree.
The file in question (client/src/pages/Section/index.js)
is present in my branch and has not been deleted.
Any suggestions on how I can resolve this?
Upvotes: 0
Views: 1319
Reputation: 489083
Files are not "in branches": specific files are in, or absent from, specific commits. Those commits are then contained in branches, but any given commit can be in many branches at the same time, or even in no branches at all. So don't think of this as "the file isn't in the branch": think of this as "the file isn't in the commit", because that's the case here.
Now, the commit in question is HEAD
, and HEAD
is tricky during rebase. This is because Git forces you to be a Git Mechanic, as if instead of just hopping into a taxicab and saying "take me to Paris", you have to first assemble the taxi from Lego parts, then assemble the driver, and then direct him to each turn, one step at a time. 😀
To "get" git rebase
, we must start with the fact that each commit, in Git, holds a full snapshot of every file. This might seem like a terrible waste of disk space, and it would be if Git commits stored files the way your computer normally stores normal files. But in fact, Git commits store files in a special, weird, Git-ized format, with the contents compressed and, crucially, de-duplicated, so there's no actual wasted storage after all.
Each commit is numbered, with a big ugly hash random-looking ID. These hash IDs are quite impractical for humans, so Git tends to abbreviate them, e.g., as 279afb1
. Git just takes the front 7 or so "digits"—these are in fact hexadecimal digits—and displays those, dropping the rest if that's safe to do, which it often is. Each hash ID is unique: no two Git commits are ever allowed to have the same one, even if they're in different repositories.1
Besides storing a full snapshot of all files, each commit also stores some metadata, or information about this one particular commit. That includes things like the name and email address of the person who made the commit, and some date-and-time stamps and so on. Crucially for Git itself, each commit's metadata stores a list of the hash IDs of some set of previous commits. Most commits store exactly one hash ID in this list; we call that the parent of the commit in question.
Because commits remember the hash IDs of their parents, and most commits remember exactly one such hash ID, when we have those commits in some branch, they form a backwards-looking chain. We say that the child commit points to its parent commit, and we can draw these commits like this:
... <-F <-G <-H
Here H
stands for the Hash ID of the latest (newest) commit in the chain. That commit that holds a permanently archived full snapshot of every file. It also remembers, in its metadata, the hash ID of earlier commit G
, its parent.
Commit G
is of course a commit as well, so it has a full snapshot of every file, and metadata that holds a commit hash ID: its parent, F
. Like G
and H
, F
is a commit, so it holds a full snapshot of every file, and metadata holding another parent hash ID.
Git can follow this backwards chain, one hop at a time, from the end of the branch, all the way back to the very first commit ever. So Git only needs to know the hash ID of commit H
, the end of the chain. That's the hash ID you have to memorize—oh wait, humans are bad at hash IDs!
To avoid forcing you to actually memorize the hash ID (or build the taxi from Lego bricks, as it were), Git provides branch names. A branch name simply holds hash ID of that last commit in the chain. We say that the branch name points to the tip commit, with tip commit being a Git term: it's the commit to which the branch name points. So we really have:
...--F--G--H <-- somebranch
Now, no part of any commit can ever be changed. That's a little math trick Git pulls so that it can generate its commit hash IDs. So not only is the archive-of-all-files permanent, so is the metadata. Commit H
will always and forever point back to commit G
, which will always point back to F
, and so on.
The branch name pointers, however, aren't frozen like this. The name somebranch
can be made to point to any commit that's in the Git repository. Suppose, for instance, we make a new commit I
, that points back to H
:
...--F--G--H <-- somebranch
\
I
We can now wheel the arrow from somebranch
around a bit so that it points to I
, which is easiest to draw like this on StackOverflow:
...--F--G--H
\
I <-- somebranch
Commit I
is now the tip commit of branch somebranch
.
This is how branches grow, in Git.
1This is mathematically impossible, so it's not actually completely true. If two different commits ever get the same hash ID, it becomes impossible to introduce the two repositories to each other. Doing so doesn't destroy the universe like some Star Trek TOS episode, it just makes Git stop working with those two repositories, so we don't do that. But Git can't know whether two commits will come together someday or not, so it tries its gosh-darned-est to make sure that the hash IDs are truly unique, and in practice, this works fine.
See also How does the newly found SHA-1 collision affect Git?
Let's say we've made commit I
and now have:
...--G--H--I <-- somebranch
Upon looking again, we discover a horrible mistake: we made a typo in the commit message, or forgot to update the README
file, or something. Oh no! Disaster! Commit I
is awful and needs to be fixed! But alas, no commit, once made, can ever be changed.
Fortunately, there's a trick we can play. Let's make a new-and-improved commit I'
, but use git commit --amend
when we make it, so that I'
's parent is not I
, but rather is H
again, like this:
...--G--H--I <-- somebranch
\
I'
Now once again we'll have the name somebranch
point to I'
instead of I
:
I [abandoned]
/
...--G--H--I' <-- somebranch
Because we (humans) use the branch name to find the last commit, we won't find commit I
any more. We'll find a different commit, with a different big ugly hash ID, I'
. Then we'll have Git follow its internal backwards-pointing arrow to H
, and then to G
and so on. It now looks like we changed a commit.
We didn't: that's impossible. But we got something just as good, or almost as good, or sometimes better: a new and improved commit.
This is what the git rebase
command is for. In our case, we'll have some set of commits and two or more branch names that help us find those commits:
I--J <-- br1
/
...--G--H <-- main
\
K--L <-- br2
Here we have three branch names: main
selects commit H
, br1
selects commit J
, and br2
selects commit L
.
If we work backwards from J
, we find commits J
, I
, H
, G
, and so on. These are the commits that are "on" or "in" branch br1
at the moment. (Because branch name arrows are moveable, this could change.)
If we work backwards from L
, we find L
, K
, H
, G
, and so on: these are the commits on br2
. It doesn't matter that H
and G
and so on are on another branch too. They're on both branches—or in fact, on all three branches, because the name main
finds H
too, and Git works backwards from there.
The branch names don't matter (except to us humans): they are just there to help us (and Git) find the commits. It's the commits that matter. As long as we have enough names to find all the commits, we can delete any extra names, if we don't need a quick way to find those commits. So if we don't care about finding H
quickly, we can delete the name main
safely, because we have two names to find H
, albeit a little slowly. We just don't want to delete the last name that finds some commit(s) until we're ready to abandon them.
For now, we'll knock out the name main
because it clutters up the diagram. We can always put it back later, provided we can find the right commit hash ID:
git branch main <hash-of-H>
will create the name again, using the hash ID (which we can cut and paste with the mouse after running git log
and finding it). We wouldn't do this normally, except for illustration, of course, as deleting and re-creating branch names is annoying and error-prone (you have to get the hash ID right).
But now that main
is gone, we have this:
...--G--H--I--J <-- br1
\
K--L <-- br2
Why did I put br1
back on the line? Mostly, just because I felt like it: it doesn't really matter how you draw the graph as long as you have the right backwards-pointing links. (I've also given up on most of the arrows in favor of --
etc., because they're hard to draw on StackOverflow: there are some arrow fonts, but they don't come out right on all computers, and they're ugly.)
Since we have two branch names, it becomes tricky to remember which name we're using. So now we'll add, to our drawing, the idea of HEAD
as the current branch. We'll put the word HEAD
(in all caps like this) in parentheses after one branch name, to show that this is the branch name we're using:
...--G--H--I--J <-- br1
\
K--L <-- br2 (HEAD)
This means we have commit L
"checked out", via having branch br2
"checked out": we used git switch br2
or git checkout br2
to get here.
The current set of files in our working tree are those from commit L
, if we haven't changed them. If we do want to change them, we do that, and git add
and git commit
and we get a new commit and name br2
gets updated to point to the new commit. But we won't do that right now.
Instead, what we'll do is say: Hey, you know, commits K
and L
are pretty OK. But they'd be better if they came after commit J
, instead of extending from commit H
.
What we'd like to do, in other words, is copy (the changes in) commit K
to some new-and-improved commit K'
, but make K'
work like this:
K'
/
...--G--H--I--J
\
K--L
Then we'd like to do the same thing for commit L
: make the snapshot into changes, and copy those changes into a new commit L'
that adds on where K'
stops:
K'-L'
/
...--G--H--I--J
\
K--L
I've taken the names—and the special name HEAD
—away here since I'm just drawing where we want to end up (in Paris?). We haven't get used Google Maps to generate our turn-by-turn directions, but that's our next step.
git cherry-pick
Now, above, I talked about the changes in a commit. But commits don't have changes. They have a snapshot and some metadata.
The trick here is that the metadata in an ordinary commit, like all the ones drawn above, includes a parent hash ID, and each parent commit also has a snapshot. If we have Git place the two snapshots side-by-side, and play a game of Spot the Difference with them, we'll get a listing of what changed.
So that's just what we do. We have Git go get the parent of commit K
, which is commit H
, and git diff
those two commits, to see what changed. That tells us what we did in commit K
.
But now we have a problem. We can't just re-do that exactly as it is, because the files in commit J
don't necessarily match up exactly with the files in commit H
! What if we added a few lines to one of the files? What if we deleted a few lines?
Well, without getting into all the details—which means this is a bit bass-ackwards, as it's better to introduce git merge
first—the way Git resolves this is by doing a "pretend merge". A merge, in Git, has three inputs:
HEAD
commit, which is the current commit, currently checked out; andThe git cherry-pick
code uses this merge machinery, which Git already has for git merge
, but when it does so, it plays a little trick: the merge base commit is forced to the parent of the commit to be cherry-picked.
So, to make this all work, here's what Git does:
Git starts by doing a detached HEAD check-out of the commit where we want to build the new commits. That's commit J
. The result looks like this:
...--F--G--H--I--J <-- HEAD, br1
\
K--L <-- br2
The two branch names still point to their tip commits, but now HEAD
isn't attached to any branch name. We're now in this special "not on any branch" mode that Git uses for git rebase
.
Git runs git cherry-pick hash-of-K
. Git has already (in a step zero that I didn't cover for space reasons) listed out the hash IDs of the commits it is supposed to copy. This step is, for no great reasons, actually horribly complicated now. It was pretty simple once, and for our case it works in the simple way and just lists out the two commits that are only on branch br2
, but for hysterical reasons it's no longer simple.
If that goes well, Git makes the new commit, K'
, on its own, re-using the commit message from commit K
:
K' <-- HEAD
/
...--F--G--H--I--J <-- br1
\
K--L <-- br2
If things don't go so well here, you get a merge conflict. And that's what you got—we'll come back to that. But for the moment, we'll assume that this goes well.
Git repeats this for every commit that needs to be copied. In this case that's just one more commit, L
, making L'
:
K'-L' <-- HEAD
/
...--F--G--H--I--J <-- br1
\
K--L <-- br2
Finally, when everything is done correctly, Git yanks the name br2
over to point to the last copied commit, and re-attaches HEAD
:
K'-L' <-- br2 (HEAD)
/
...--F--G--H--I--J <-- br1
\
K--L [abandoned]
That's how rebase works, or should work if all goes well.
You got:
CONFLICT (modify/delete): client/src/pages/Section/index.js deleted in HEAD and modified in 279afb1 (create content pages). Version 279afb1 (create content pages) of client/src/pages/Section/index.js left in tree.
This usually happens, and in your case did happen, at the first cherry-pick step. So you're here:
...--F--G--H--I--J <-- HEAD, br1 (the branch you're rebasing onto)
\
K--L <-- br2 (the branch you're rebasing)
The HEAD
commit isn't your commit at all! It's their commit, J
. The commit that Git calls theirs is your commit K
. The merge base, from which Git decided that "you" (i.e., they, in commit J
) deleted client/src/pages/Section/index.js
, is commit H
: the common starting point where you and they diverged. They deleted the file in one of their commits—maybe in commit I
, not J
after all, but it any case it's gone, so that git diff
says that the file is deleted.
You, meanwhile, changed the file a bit.
Your job, as the programmer who actually understands the source code—Git just thinks of it as text lines that should be combined line-by-line, without regard to what's a comment, what's a string, what's good and what's not—is to figure out how to combine your work with their work. Perhaps you need to modify some lines in some other file, or add lines to some other file, or put client/src/pages/Section/index.js
back with just a few lines in it.
Once you figure out what the right content is, you will have to git add
this file, if it should exist, or git rm
it if it should not.2 You should check all the files that Git thinks it successfully merged, because some of them could well be wrong. If they're wrong, you need to edit them until they're right, and git add
them as usual.
Once you've resolved the conflict, you need to run:
git rebase --continue
to have Git go ahead and finish the cherry-pick operation to create commit K'
. Git will probably go on to create L'
on its own, unless the resolution you chose requires more work on your part (e.g., perhaps there are more changes you made to that same file that now need to be made to some other file instead).
Once all the cherry-picking has finished, Git will put you back on your branch, and HEAD
will be sensible again. Until then, you're in this rebase limbo: you can't go anywhere unless you either tell Git to abort the entire rebase, going back to where you were before you started, or you finish the rebase.
Since no existing commit can ever be changed, if you do decide to abort the rebase, you're just back to where you were before you started the whole thing. And, if you do finish the rebase and decide that you hate the result, it is possible—albeit a little tricky—to find the hash ID of the original L
commit and get it all back, at least if you do that within about 30 days. After a long enough time, Git starts to think that maybe it should toss abandon commits into the rubbish bin.3
2git rm
will complain because the files is not in the index at slot zero, but it will work. An alternative is to remove the file from the working tree and use git add
, which doesn't complain, but feels weird to me, so I always use git rm
and ignore the gripe. It's your choice: all Git needs to know is what the final result should be: is there a file, or isn't there? If there is a file, what are its contents?
3This cleanup process can be delayed a long time, and on some hosting systems like GitHub, it's never done, for reasons that are known mostly to the GitHub folks. You should neither assume that these old commits will be cleaned up, nor that they won't be, but you do get a minimum of 30 days by default to change your mind.
Upvotes: 0