fei wang
fei wang

Reputation: 11

Git command- rebase two commit history

I want to rebase two commits into one. I used the "git log --oneline" to list all commits. I wanted to rebase 4288bf0 and b38dacd into one commit. enter image description here

run the command "$ git rebase -i f9721dc 4288bf0" to rebase enter image description here

run the command "$ git push --force" to push the update

enter image description here But failed, and I found a new branch was created. At the beginning, I was at the "master" branch, but I was at the "((fad2aba...))" branch after the rebase command. Do you guys have any ideas? I was totally confused. Thanks a lot.

Upvotes: 0

Views: 554

Answers (1)

torek
torek

Reputation: 490168

fad2aba... is not—well, not exactly—a "new branch". This is what Git calls a detached HEAD, and it's what you asked Git to do, although obviously you did not realize that this is what you were asking Git to do. What you wanted to run, earlier, was just git rebase -i f9721dc.

Long: what's going on

As you are discovering right now, Git is mostly about commits. The commits are numbered, using these random-looking hash IDs,1 which are actually just very large numbers expressed in hexadecimal. This numbering scheme means that a commit, once made, can never be changed.

Besides being numbered like this, each commit:

  • stores a full snapshot of all your files, as of the time you make the commit; and
  • stores some metadata, or information about the commit itself, such as who made it and when.

The metadata for any one particular commit includes the raw hash ID of its immediate predecessor, or parent, commit.2 That is, from a later commit, Git can work backwards to an earlier commit. This means that given a string of ordinary commits, we need only find the hash ID of the last such commit.

If we wish to draw such a string of commits, we can do so, putting the most recent one at the right and using uppercase English letters to stand in for the hash IDs:

... <-F <-G <-H

Here H stands in for the hash ID of our most recent commit. Commit H stores a snapshot and some metadata, and in the metadata for H, Git will find G's actual hash ID. Similarly, G stores a snapshot and metadata, including the hash ID of earlier commit F. This goes on all the way back to the very first commit ever (and then stops, as all things that cannot go on forever do). We say that commit H points to earlier commit G, which points to F, and so on.

In order to find commit H quickly and easily—or sometimes, at all—Git would like us to have a name for it. This is where a branch name comes in. We pick some branch name, such as master or main, and have Git store H's raw hash ID in that name, so that we do not have to memorize it ourselves. We then say that this name points to commit H, just as commits point backwards to earlier commits.

I already mentioned that no part of any existing commit can ever change. This includes its metadata, which means that commit H must always and forever point back to commit G. I like to make use of this in my StackOverflow drawings to be a bit lazy about the connecting arrows, in part because arrow fonts in text are often rather limited, so I will now draw our situation like this:

...--F--G--H   <-- master

Here, Git gets to use the name master to find the hash ID of the last commit in the branch, H. Then Git uses H to find G as usual, and so on.

This is how branches—or at least, branch names—work in Git. A branch name is simply a name for the last commit that we will say is "in" or "on" the branch.3 That ends up being Git's definition: a branch name points to the tip commit of the branch.4


1The hash IDs are the numbers that come out of what was once a pretty strong cryptographic function, SHA-1. Due to advances in computing, it's now a bit weak, and the Git guys are working on switching to SHA-256.

2More precisely, any one commit lists any number of previous commits, but most commits list exactly one parent. These so-called ordinary are the commits we'll be concerned with here. In any non-empty repository, there is at least one commit—the first one—that has no parent, and there are usually merge commits, which are defined as commits with two or more parents, but when rebasing, we rarely want to deal with the complications these produce.

3Git uses "on" more often than "in", but because a commit is often contained within multiple different branches simultaneously, "in" seems like a better word to me. Well, one must be flexible sometimes. 😀

4This particular definition implies that a branch is distinct from a branch name. Given the way people use words, sometimes it is, and sometimes it isn't. See also What exactly do we mean by "branch"?


Multiple branch names, and attached HEAD

The pictures I have drawn so far are still woefully incomplete, because I've only included one branch name. To really "get" Git, we need to look at how Git handles multiple branch names. So let's take our drawing with one branch, and add another, develop, to it:

...--F--G--H   <-- develop, master

We might do this by running git branch develop, for instance, to create the new name develop. This name will also point to existing commit H. Branch names, in Git, must at all times point to some particular commit that exists within that Git repository.5

But when we work with Git, we use either git checkout or git switch to pick some branch name. We run a command like git checkout develop and Git tells us that we are now "on" branch develop. The git status command literally says on branch develop at that point, for instance. So: how does Git know which branch name is the current branch name?

Git's answer is to use a very special name, HEAD.6 We attach this special name to one branch name, like this:

...--F--G--H   <-- develop, master (HEAD)

This indicates that we are on branch master. Once we run git switch develop, though, we have:

...--F--G--H   <-- develop (HEAD), master

which indicates that we are on branch develop.

In this state, if we make a new commit, Git will:

  • package up a source snapshot;
  • package up some metadata, including someone's name and email address, the current date-and-time, a log message, and the raw hash ID of commit H, which is the current commit;
  • write all of this out to obtain a new, uniquely-numbered commit that we'll just call "commit I"; and
  • write commit I's hash ID into the current branch name.

We will draw commit I as pointing back to commit H, because it will do so. So now we have:

...--F--G--H   <-- master
            \
             I   <-- develop (HEAD)

Note that our HEAD is still attached to the name develop, but the name develop itself now points to new commit I, rather than existing commit H. Since we're on develop, not master, the existing name master has not moved.

This indicates that the last commit on branch master is still H, while the last commit on branch develop is now commit I. Commit H is still on both branches, as are all earlier commits. New commit I is currently only on develop, although with various further Git commands, we can cause the name master to move such that commit I is also on master.

In the end, we have the following:

  • The set of commits that is "on" some branch is the set of commits that is reachable from that branch name, starting with the last or tip commit of the branch, whose hash ID is stored in the branch name, and then working backwards.

  • The special name HEAD is normally attached to one branch name. This makes that particular branch the current branch. Its tip commit—the commit the name itself selects—is then the current commit.

But this is not Git's only working mode.


5This constraint is somewhat arbitrary (why can't a branch name point to a commit we don't happen to have at the moment, as long as we could at least get it?) but is made by the Git software.

6Note that HEAD should always be written in all uppercase like this. Lowercase head works—sort of, to some extent, sometimes—on some systems, but has some cases where even when it seems to be working, fails sometimes too. There's a shorter way to write HEAD, using @; in many ways I think it would have been better if Git had just used @ as a symbol from the start, rather than starting with HEAD and then adding @ as an accepted abbreviation in Git 1.5 or so.


Detached HEAD mode

Suppose we have a series of commits ending at H as before:

...--F--G--H   <-- master (HEAD)

and we wish to examine one of the historical commits—say, commit F—by checking it out. We have two options here. We could have Git attach a branch name to commit F:

git branch historical <hash-ID-of-F>

which we would then draw as:

...--F   <-- historical
      \
       G--H   <-- master (HEAD)

and we can now run git checkout historical or git switch historical, to view commit F. This might even be the thing to do, especially if we plan to make a series of new commits I and J that come right after F:

...--F   <-- historical (HEAD)
      \
       G--H   <-- master

would become:

       I--J   <-- historical (HEAD)
      /
...--F
      \
       G--H   <-- master

(although now the name historical is clearly wrong and we should change it!). But if we just want to look around, and maybe build a test version of the software or something, Git allows us to do so without creating a branch name. To do this, Git uses what it calls detached HEAD mode. We can draw that this way:

...--F   <-- HEAD
      \
       G--H   <-- master

Note how HEAD is no longer attached to any branch name. It's just floating out there freely. If we do create new commits at this point, Git makes them as usual, but instead of updating a branch name, just writes the raw commit's hash ID directly into HEAD:

       I--J   <-- HEAD
      /
...--F
      \
       G--H   <-- master

As soon as we run git checkout master, here's what happens:

       I--J   ???
      /
...--F
      \
       G--H   <-- master (HEAD)

If we wish to find commit J or I now, we'll have to use some tricky mechanism,7 because there's no branch name holding the hash ID of commit J now.

What all this means is that we don't normally work in detached HEAD mode. But we do often get into detached HEAD mode for a little while, to examine or do something interesting with some historical commit. This is where git rebase normally comes in.


7Git has several such mechanisms, including a generalized one called reflogs. These are good for recovering from mistakes, but are best not used for most planned work as they're rather awkward to deal with.


Rebase uses detached HEAD mode intentionally

Note: This is not the cause of the problem you hit, but it would be irresponsible not to mention this, because many rebases have to stop in the middle and when they do, you are in detached HEAD mode on purpose. So we should discuss this.

I mentioned several times above that no commit can ever be changed. And yet, the git rebase command is all about improving some set of existing commits. That is, we have some existing commits—which we acknowledge cannot be changed—that are in some way defective, so we would like to fix them. That implies "changing" the commits, which we just said we can't do. This is all true, but there is something we can do, and that is: copy the old commits to new and improved ones.

Suppose, for instance, that we have this initially:

...--G--H   <-- master (HEAD)

We create and switch to a new branch and make some commits:

...--G--H   <-- master
         \
          I--J   <-- develop (HEAD)

In the meantime, someone else, in a clone of the same Git repository, also makes new commits. They get these commits into some sort of shared version of the repository, on the to-be-shared master (reflected in our origin/master), so that we now have:

          I--J   <-- develop (HEAD)
         /
...--G--H   <-- master
         \
          K--L   <-- origin/master

in our repository. Sooner or later, we will be forced to combine our work—in our commits I-J—with their work, in their commits K-L that we have obtained from the shared repository. We could do this with git merge, which—leaving out all the branch names and a lot of other details—would result in:

          I--J
         /    \
...--G--H      M
         \    /
          K--L

and in many ways and many cases, this might be the right thing to do. But for whatever reason, whether it's some corporate policy, or some personal stubbornness, or whatever, we will choose instead to use git rebase.

We literally cannot change our existing I-J commits, but we can make new-and-improved commits—let's call them I' and J', to indicate that they're new-and-improved versions of I and J—and when we use Git to make the new commits, we can put in whatever snapshot and metadata we want. So we can make sure that the new commits look like this, when added to our existing picture:

          I--J   <-- develop
         /
...--G--H   <-- master
         \
          K--L   <-- origin/master
              \
               I'-J'  <-- temporary-branch (HEAD)

To get to this picture, we could start with:

git checkout -b temporary-branch origin/master
# or: git switch -c temporary-branch origin/master

and then use git cherry-pick twice, as the cherry-pick command does this kind of "copy a commit" trick:

git cherry-pick develop~1
git cherry-pick develop

Once we're done with this, we just need to make the name develop identify the last-copied commit, J', which we might do with git branch -f now, then use git checkout or git switch to get back "on" develop. Or, we could use git checkout -B or git switch -C to do this with one command. Then we delete our temporary branch name entirely, and have this picture:

          I--J   [abandoned]
         /
...--G--H   <-- master
         \
          K--L   <-- origin/master
              \
               I'-J'  <-- develop (HEAD)

We can now "fast-forward" our master to point to commit L, giving:

...--G--H--K--L   <-- master, origin/master
               \
                I'-J'  <-- develop (HEAD)

and it looks like we just started our work after the other person's commits were all done. And, if we never give the original I-J commits to anyone else, who is to say that we didn't?8

In any case, consider particular sequence: creating a temporary branch, switching to it, running git cherry-pick repeatedly, and then forcing the original branch name to point "here", wherever that is when we are done, and switching back to the branch name and deleting the temporary branch. This works fine, but it has one obvious problem: What temporary name should we use? Whatever name we pick, what happens if the person using Git picked that same name already?

What git rebase does here is make use of the fact that we don't need any name. Git can use its detached-HEAD mode to do the job. That is, git rebase will:

  • enumerate the hash IDs of commits to be copied via git cherry-pick or equivalent;
  • use git checkout --detach or git switch --detach to enter detached HEAD mode at the place where the copies should go: in our case, detach to commit L;
  • copy the commits, one by one, as directed; and last,
  • yank the branch name we were on, to wherever HEAD is now, re-attaching HEAD in the process.

The result of this sequence is the copying we wanted to do.

Using git rebase -i allows us to modify the "copy" process to rearrange commits, do squashes / fixups, and so on, but otherwise is this same relatively simple process. Note that the "pick" command in the instruction sheet you edit means run git cherry-pick. In historic Git, git rebase -i was a shell script, and it literally ran git cherry-pick here; now it's all a big C program that has cherry-pick built in, but it works the same in principle.


8Many people feel this kind of "linear history", where each person doesn't start working until the previous person finishes, looks neater and is therefore easier to understand. It is, of course, a lie of sorts too, because the actual development happened in parallel. Whether a lie that simplifies history like this is good or bad (or both!) is perhaps a question for philosophers and historians.9 In any case, Git gives you the tools, should you wish to engage in these kinds of lies.

9My philosophical answer to whether this kind of lie is good or bad is "yes, or at least maybe". 😀 It is good, or bad, or maybe neutral, or perhaps even both at the same time.


What rebase needs vs what it allows

Now that we know what git rebase does:

  • enumerate commits to be copied;
  • detach HEAD at the new "base" (target for copying);
  • copy the commits; and
  • move the starting branch name and re-attach

we can see what git rebase requires:

  • the current branch name, which Git can read with git symbolic-ref HEAD;
  • a list of commits to copy, which Git can get with git rev-list upstream..HEAD; and
  • a target commit hash ID, which Git can get with git rev-parse newbase.

If we look at the git rebase documentation, that's more or less what we see in the SYNOPSIS section:10

git rebase [-i | --interactive] [options ] [--exec cmd]
      [--onto newbase | --keep-base] [upstream [branch]]

This is fancier than just what we need, in part because of the options like -i / --interactive, but we see that:

  • --onto newbase provides the target for copying;
  • upstream provides the argument for our upstream..HEAD to git rev-list.

The --onto part is optional. If we leave it out, Git uses our upstream argument as the --onto argument.

The upstream argument is optional too. If we leave that out, Git uses @{upstream} to figure that one out.

But then there's this extra branch option. What is it doing here? Let's go back to the documentation:

DESCRIPTION
If <branch> is specified, git rebase will perform an automatic git switch <branch> before doing anything else. Otherwise it remains on the current branch.

Now, this is reasonably current documentation: older versions of Git say git checkout <branch> instead. And, at least with older versions of Git, git checkout hash-ID works, and means the same thing as git checkout --detach hash-ID. So when you pass a raw hash ID, in this particular git rebase positional argument position, to git rebase, the rebase operation starts by detaching HEAD.

One might think that git rebase should refuse to run at all in detached HEAD mode. Its last step, after all, is supposed to be "move the starting name and re-attach", and in detached HEAD mode, there is no starting branch name! But for whatever reason,11 the folks who wrote this way back when—mostly Linus Torvalds, really—decided that if you run git rebase when in detached HEAD mode, it will just skip the last step entirely.


10The SYNOPSIS section actually lists three variants of running git rebase, at least at the moment. I copied only the relevant one here.

11Whether that reason is "because they could", and if so, whether that's a good or bad reason, is another question for philosophers.


So, by running:

git rebase -i f9721dc 4288bf0

you told git rebase to do a git checkout 4288bf0 as its very first step, before doing anything else. Then it ran git rebase -i f9721dc. That first step—the git checkout that produced a detached HEAD—means that the remaining git rebase steps also occurred in detached HEAD mode.

Note that your original image has two more commits after 4288bf0. Because Git finds commits backwards, starting from some branch name—or from your detached HEAD—this is going to force you to copy those two commits after you take your other two commits and squash them together in your interactive rebase.

That is, you need to have git rebase start with an instruction sheet that has four pick commands, not just two. You then want to change one of the four pick commands to squash. You would do this by starting on branch master and using git rebase -i with just one hash ID. The rebase code will then:

  • enumerate all four commits and build the instruction sheet;
  • let you edit the instructions;
  • using your edited instructions, copy-but-squash the two commits you want squashed, and run your editor to let you provide the commit message for the new (single) replacement for the original two commits;
  • using your edited instructions, copy each of the remaining two commits to new-and-improved versions, where the only actual "improvement" is that they point back to the squashed commit: everything else about the new-and-improved commits is identical to the originals; and
  • take the then-current branch name (master, before the HEAD-detaching) and yank it over to point to the final copied commit.

You could do this with:

git rebase -i f9721dc master

which will start by doing git checkout master (or git switch master, if you update your Git version). But if you're already on master, there's no need. I personally recommend running git rebase without ever providing that last branch argument, so that git rebase keeps a sort of symmetry with git merge, which doesn't provide any extra "first check out some branch" argument—but the tool does what it does, and as long as you know how to use it, it's OK to use it this way.

Upvotes: 3

Related Questions