Git rebase attempting to rebase commit that occurred before target commit

Question

I ran git rebase -i . After making changes, I ran git rebase --continue. Git then properly rebased 15 out of 15 commits. After successfully doing so, I received this error:

error: could not apply ... 

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".
Could not apply ...

I checked up on what the commit related to bad hash was. I discovered that this commit occurred about 3 months before target hash. So why is git rebase even touching this commit? I thought rebase only targeted commits from the current commit to the target hash.

Does anyone know what the issue may be? Do I have a fundamental misunderstanding of rebase or is something afoul?

torek · Accepted Answer

TL;DR

It's hard to tell without your actual repository, but you may be rebasing across a merge.

Long

Rebase doesn't look at commit dates. The keys to understanding Git rebase, whether interactive or not, are:

understanding how branches grow;
understanding git cherry-pick; and
viewing (and selecting commits within) the commit graph—especially the idea of reachability.

These are a bit intertwingled. For a much longer, but nice, introduction to the idea of reachability, see Think Like (a) Git.

Understanding the Git commit

Let's take a look first at the anatomy of a simple commit. Here's one from the Git repository for Git:

$ git rev-parse HEAD
5be1f00a9a701532232f57958efab4be8c959a29
$ git cat-file -p 5be1f00a9a701532232f57958efab4be8c959a29 | sed 's/@/ /'
tree 8ccb7d4fa49449a843b00aca64baf99feb10e2ab
parent e7e80778e705ea3f9332c634781d6d0f8c6eab64
author Junio C Hamano  1516742470 -0800
committer Junio C Hamano  1516742470 -0800

First batch after 2.16

Signed-off-by: Junio C Hamano

A commit, or any other Git internal object, is uniquely identified by its hash ID such as 5be1f00a9a701532232f57958efab4be8c959a29. These are not very useful to humans, so we use names like master and HEAD and v2.16.0 and so on to identify them instead, but Git eventually uses these raw hash IDs.

A commit stores:

a snapshot hash (tree ...);
a parent commit hash (parent ...), or sometimes multiple parents;
an author and committer (name, email, and timestamp), which you supply automatically via your configuration; and
a log message (this is the only part you supply manually for a commit).

Every commit—in fact, every Git object—is read-only. You cannot change anything about any existing commit.

Because each commit records its parent, the commits form a chain. If we start at the most recent one—which is what Git does—we can look at it and find its parent. That gives us a second commit hash, so we can look at that commit, and find its parent, which gives us a third hash, and so on. We say that each of these stored hash IDs points to another commit.

Drawing the commits, and how branches grow

In other words, by starting with a single pointer, pointing to the end of the commit chain, we can work backwards along the chain of commits:

... <-o <-o <-o <-o   <--last-one

This "last-one" pointer is what Git calls a branch (or more precisely, a branch name). A branch name simply stores the hash ID of one particular Git commit: namely, the one at the tip of the branch. (We therefore call this the tip commit.)

To grow a branch, as by git commit, Git will first create the new commit by writing out a tree object for it (to find the tree's hash), and then creating the rest of the commit from the known data: the tree, the current commit's hash as stored in last-one, you as the author and committer (with "now" as the timestamp), and your commit message. This commit goes into the repository, which produces a new and unique hash ID.

The new commit N points back to the previous tip of the branch:

...--o--o--o--o   <-- last-one
               \
                N

and now that Git knows what the new commit's new hash ID is, all Git has to do is write that hash ID into last-one, so that last-one points to the new commit:

...--o--o--o--o
               \
                N   <-- last-one

(and then we can draw this without the bend at the end).

Cherry-picking: copying commits

While a commit is a snapshot, we often like to view it as a change. To view a commit as a change, we simply take the snapshot for the commit's parent first, then the snapshot for the commit itself, and compare them:

git diff

The output from this command is a set of instructions: if you make these changes to the parent, you'll get the child. (Ideally, this is the same as what the person who made the change actually did, although one will see Git fall short of this ideal somewhat often.)

Suppose we have one branch, with another branch growing out of the middle of it, or two branches that share some common base:

          C--D   <-- br1 (HEAD)
         /
...--A--B
         \
          E--F--G   <-- br2

(Here I used one-letter names for commits, instead of big ugly hash IDs, so we'll run out of letters after just 26 commits.) Suppose further that you have run git checkout br1—this attaches your HEAD to br1, which is why we've drawn it that way—and at this point you realize that things would be much better if you could just grab, for br1, the same change that you made earlier to make commit F.

The git cherry-pick command will do that. It will examine F as compared to E, to see what changed. Then it will (attempt to) make the same changes to the contents of commit D, where we are now. Last, it will make a new commit from the result, if all goes well. This new commit is a lot like F except that:

its parent is D, not E, and
while it makes the same changes as F, it makes them to whatever's in D, not whatever's in E.

In other words, git show will show the same changes as git show , even though the base to which those changes are applied may not be the same.

Because this is a copy of F, let's call it F' rather than H:

          C--D--F'  <-- br1 (HEAD)
         /
...--A--B
         \
          E--F--G   <-- br2

This is what a cherry-pick is / does: it has the effect of copying a commit.

Rebasing

There are several different use cases for rebasing, but they all build on this one basic idea: we can copy commits. While we do the copying, just before actually committing (to a read-only commit), we can change something about the new copies.

The first common use case is to transplant a branch. Suppose instead of drawing branches 1 and 2 above as we did, we draw them more like this:

          C--D   <-- develop (HEAD)
         /
...--A--B--E--F--G   <-- master

(This is in fact exactly the same graph, we just gave it different branch names and flattened out the bottom row.) Now let's say that we'd like to have develop based on commit G rather than on commit B. Suppose we were to create a new, temporary branch starting at G, and make the new branch the HEAD:

          C--D   <-- develop
         /
...--A--B--E--F--G   <-- tmp (HEAD), master

Now we cherry-pick C here, to make a copy C' that goes after G and updates tmp to point to the new copy:

          C--D   <-- develop
         /
...--A--B--E--F--G   <-- master
                  \
                   C'  <-- tmp (HEAD)

We repeat for commit D:

          C--D   <-- develop
         /
...--A--B--E--F--G   <-- master
                  \
                   C'-D'  <-- tmp (HEAD)

Last, we tell Git to peel the label develop off commit D and paste it instead onto commit D', and while Git is at it, throw away the temporary name too and make develop be HEAD again:

          C--D   [abandoned]
         /
...--A--B--E--F--G   <-- master
                  \
                   C'-D'  <-- develop (HEAD)

There's no name for original commit D any more, so we won't see it, and eventually (after 30 or more days by default) Git will garbage-collect it and it will really be gone.

In the end, it looks like we somehow moved commits C and D. We didn't, really: we copied C to a new, slightly-different C', and copied D to D'. But as long as no one remembers the original C and D, we might as well have moved-and-changed the commits. The name develop now locates commit D', not D; as long as we use the name to find the commits, we see only the shiny new replacements.

Regular rebase

The simple form of a regular rebase is:

git checkout   # first, ensure your HEAD is attached
git rebase         # then do the rebase-by-copy thing

The target here is typically another branch name. For instance, to achieve the rebase we drew above, we would check out develop and run git rebase master. The master part tells Git where to start doing the copying—where the temporary branch goes, as it gets built up, commit by commit. But there's something important missing here: How does git rebase know which commits to copy?

The answer lies in a more general trick that Git uses. You will see and use this often, e.g., with git log: you tell it where to start, and you also tell it where to stop. If you do this by commit hash IDs, you can write things like:

git log master ^1234567

which tell it to start from the tip of master, but stop when it reaches commit 1234567, whatever that one is. You can write this instead as:

git log 1234567..master

as these mean the same thing: start at the tip of master; stop with 1234567.

The tricky part here is that Git doesn't have to encounter commit 1234567 itself directly. The "stop" directive stops Git when it reaches any commit that is reachable from the stopping point. This lets us write things like:

git log master..develop

even if master contains commits that develop does not.

In our case, git rebase uses this two-dot notation to exclude commits B and earlier from the copy process. (It's a lot more complicated under the hood, but it all descends from this idea.) That is, we have Git choose both what to copy and where to put the copies using a single name, master: the copies go after the current tip of master, and copy the commits that are reachable from HEAD, but are not reachable from the tip of master.

You can split the two parts, if necessary, and sometimes it is necessary: you can use git rebase --onto to say that the copies should go after the commit identified by target, but should take commits from HEAD when stopping at commit stop-at (or reachable from that point). This lets you take a graph like:

               C--D   <-- important-fix
              /
          A--B--E   <-- feature1
         /
...--o--F--G--H--I   <-- mainline

and tell Git to copy just C and D to go after mainline:

$ git checkout important-fix
$ git rebase --onto mainline feature1

Git will list the commits on HEAD aka important-fix that are not reachable from feature1 (hence just C--D). These will be the commits copied; they will go after commit I (mainline). The result will be:

               C--D   [abandoned]
              /
          A--B--E   <-- feature1
         /
...--o--F--G--H--I   <-- mainline
                  \
                   C'-D'  <-- important-fix

There's something else worth noting here: The rebase process leaves the original chain of commits behind. You can always undo a rebase, up until the original chain is eventually garbage-collected. This is particularly useful if you want to experiment with rebasing.

Interactive rebase

Using git rebase -i we can tell Git to copy commits (as with other rebases), but pause and let us make changes, or combine several existing commits before making the final newer, shinier copy we should use instead of the original.

Interactive rebase needs the same inputs as the non-interactive rebase:

which commits should it copy?
where should the commits go?

The main difference is that after making a list of the commits to copy, it will write that list to a file containing instructions. Each copy to make will be listed as a pick operation: do a cherry-pick of that commit. You can change the list! When you are done changing the list, in whatever way, you write out the list of operations, and Git then carries them out.

The set of commits that will be copied is whatever is in the list after you write it out. You can even add to the list, if you want. "Pick" means do a cherry-pick, while "squash" or "fixup" says to do the previous cherry-pick step without quite committing (see git cherry-pick -n) and then cherry-pick some more and only then commit. "Edit" means cherry-pick but stop for amending. "Drop" means the same thing as commenting out or deleting a "pick" line: don't do anything at all with the commit. (Most of these rely on a few other special tricks that I'm deliberately glossing over here; this is just the general idea.)

Note that in all cases, you're building up a linear chain of commits

Git's rebase commands always build up a new chain of commits one at a time, as if by running git cherry-pick. In some cases, such as interactive rebase, Git literally does run git cherry-pick. There's a very important thing about this, which is: it's very hard to cherry-pick a merge. As a result, git rebase doesn't even try:

                 D--E
                /    \
            A--B      G--H   <-- feature
           /    \    /
          /      C--F
         /
...--o--o--o--o   <-- mainline

If you run git checkout feature; git rebase mainline, Git must select commits to copy, then copy them. The commits Git will select are A, B, C and F, D and G, and ... H. It will skip the merge commit G. If all goes well, the copy will look like this, although it's hard to say what order the C'-F'-D'-E' chain will actually show up:

                 D--E
                /    \
            A--B      G--H   [abandoned]
           /    \    /
          /      C--F
         /
...--o--o--o--o   <-- mainline
               \
                A'-B'-D'-E'-C'-F'-H'   <-- feature

There is one special kind of rebase, git rebase --preserve-merges, that tries to retain merges while rebasing. This is technically impossible; so instead, it re-performs the merges. The result is pretty tricky, and it does not work well with interactive rebase. (In other words, don't use this unless you know what you're doing.)

The way I like to put this is that rebase flattens merges. In some cases, that's what you want. In most cases, it's not. In interactive rebase, it often means that you've chosen a commit "too far back" in your list for the --onto target and stopping point, so that it comes before a merge:

...--o--A--B--C---F--G   <-- branch (HEAD)
            \    /
             D--E

If you run git rebase -i here, you will make Git copy commits D-E-G and put them all after C. You probably meant to copy just F and G. But you can't copy F: it's a merge commit.

There are two ways to deal with this while using rebase directly:

Just copy G only; leave F in place.
Use the merge-preserving code in interactive mode (but this is tricky).

There's a third way, which is to do your rebase piecemeal, by creating your own temporary branch. That is, go back to the way you learned above, before you learned to use the git rebase power-tool. Create a temporary branch and cherry-pick individual commits; when you reach the point where you want a merge, run git merge; then cherry-pick more individual commits.