Undo an amendment of an older commit using a different branch

Question

Suppose I've git rebase'd a branch b2 of my repository, amending an older commit (c1). This commit exists, unamended, on another branch b1 (and there are some more common commits on both branches after c1, then b2 diverges from b1).

Now, I want to use to essentially undo my amendment to c1 on b2. How should I do this, so that the two branches' history becomes maximally identical again?

torek · Accepted Answer

TL;DR

Use git rebase --onto. Specify the target with the --onto argument and specify the commits not to copy with the usual upstream argument. It's hard to say from here exactly what those arguments should be; see the long discussion below.

Long

This requirement:

... that the two branches' history becomes maximally identical again

means that it becomes important that you know that git rebase works by copying commits. For a quick recap of ideas let's note that:

each commit has a unique hash ID;
each commit holds a snapshot of all files; and
each commit also holds some metadata.

The metadata in the commit include the author and committer and log message, but also—key to this whole process—the hash ID of the parent of the commit.

To turn a commit into a change-set, i.e., to find out what someone changed in any given commit, we have Git compare the commit to its parent. The commit stores its parent's hash ID so git show or git log -p can find this on its own.

Meanwhile, a branch name like b2 just holds the hash ID of the last commit in the branch. So we can draw them—the commits and the branch names—like this:

... <-c1 <-c2 <-c3    <--b2

where each c_i is an actual commit represented by its hash, and an arrow coming out of something means points to: the branch name b2 points to commit c3, c3 points to c2, c2 points to c1, and so on.

Nothing about any commit can ever change so we can draw the internal arrows, from commit to commit, as connecting lines instead of arrows, as long as we remember that they're part of the child and point backwards to the parent. This lets us draw, in crude text graphics, more than one branch. I'll use X1 and up as this is just an example, not exactly related to your starting point:

...--X1   <-- branch1
       \
        X2--X3   <-- branch2

If branch1 acquires more commits, the newest eventually leads back to X1:

...--X1--X4--X5   <-- b1
       \
        X2--X3   <-- b2

Now back to your original setup:

Suppose I've git rebase'd a branch b2 of my repository, amending an older commit (c1). This commit exists, unamended, on another branch b1 (and there are some more common commits on both branches after c1, then b2 diverges from b1).

I'll draw as accurate a picture as I can of the original setup:

                    c4--c5   <-- b1
                   /
...--c0--c1--c2--c3
                   \
                    c6--c7   <-- b2

(I'm filling in missing details with guesses, though in the end the guesses should not matter much.)

When you ran git rebase -i while on b2 and amended/edited commit c1, Git had to copy c1 to some new and different commit. The new commit has an author and log message as usual, which were initially set up to be copied from c1, but it has a new and different hash ID and maybe a different snapshot and/or different log message (or even different author), depending on exactly what you changed. Let's call the new copy c1' to tell them apart:

                    c4--c5   <-- b1
                   /
...--c0--c1--c2--c3
       \
        c1'  [copying in progress]

Because c1 got copied to c1', Git now was forced to copy c2 to a new c2':

                    c4--c5   <-- b1
                   /
...--c0--c1--c2--c3
       \
        c1'-c2'  [copying in progress]

The difference from c1' toc2' is the same as the difference from c2 to c1. That is, if we compare c2 vs its parent c1, we'll get some set of changes. If we compare c2' to c1', we'll get the same changes, even if c1 has different contents from c1'.

Now that c2 is replaced with c2', this forces Git to copy c3 to c3' as well. That forces Git to copy c6 and c7 too. The final copied commit is c7' and git rebase finishes by yanking the name b2 over, so that the final result is:

                    c4--c5   <-- b1
                   /
...--c0--c1--c2--c3--c6--c7   [old b2, now abandoned]
       \
        c1'-c2'-c3'-c6'-c7'  <-- b2 (HEAD)

Now, I want to use to essentially undo my amendment to c1 on b2. How should I do this, so that the two branches' history becomes maximally identical again?

You may also have added even more commits since then:

                    c4--c5   <-- b1
                   /
...--c0--c1--c2--c3--c6--c7   [old b2, now abandoned]
       \
        c1'-c2'-c3'-c6'-c7'-c8--c9   <-- b2 (HEAD)

What you might like to end up with is:

                    c4--c5   <-- b1
                   /
...--c0--c1--c2--c3--c6--c7--c8'-c9'  <-- b2 (HEAD)
       \
        c1'-c2'-c3'-c6'-c7'-c8--c9   [abandoned]

It's probably OK (and usually a lot easier to do) if you end up with:

                    c4--c5   <-- b1
                   /
...--c0--c1--c2--c3--c6"-c7"-c8'-c9'  <-- b2 (HEAD)

with none of the abandoned commits drawn in.¹

To get either of these, you want to:

tell Git to copy at least c8 and c9 (assuming they exist), and
tell Git to not copy c1' through c3'

and this means using git rebase --onto, so that you can separate two instructions that git rebase usually combines.

That is, what git rebase does is:

Enumerate some list of commits to copy. In the example above, these were c1 through c7. If you use git rebase -i, the list goes into a modifiable instruction sheet using the word pick along with each commit's hash (shortened) and the subject line from the commit log message.

Normally, merge commits—those with two or more parents—are ejected from the list right away. (There are some modes that don't reject them; these are more complicated and we'll ignore them here.)

Which commits get listed? That's from your upstream argument: the commits to be copied are those that are reachable from HEAD by walking backwards, commit to commit: c7 leads back to c6, which jumps back to c3, which goes back to c2, and so on. But, from this list—which could go back a really long way—we remove any commits reachable from the upstream argument. So if upstream is the hash ID of c0, we'll take c0 away from the list, and also any commit before c0. That means the list starts with c1 and ends with c7, skipping the unreachable-this-way c4 and c5.
Pick a target for --onto. If you use --onto, you choose this directly. If not, you choose it with your upstream argument. For instance, with git rebase master, the upstream is master and the --onto target is the commit to which the name master points. Git does a git checkout --detach here (or the internal equivalent) so as to get off the branch with the commits you're copying.
Begin copying commits, as if by git cherry-pick, one at a time. Some rebase operations literally use git cherry-pick and some don't, quite.
When the copying is finished, move the original branch name so that it points to HEAD, which is the last copied commit, or—if we didn't copy any commits after all—the --onto target. Then get back on that branch, as if by git checkout name.

Using --onto lets you change the upstream argument without also setting the target of the rebase.

So, if you want to copy just c8 and c9, you can tell by inspection that the --onto target is c7, and that the first commit you don't want copied is c7. This wouldn't require git rebase --onto after all. If you have the hash ID of c7 available, you could, for instance, run:

git rebase

while on branch b2. But to find the original c7, before the first rebase, you'll have to rummage through the reflogs. This can be difficult, as the reflogs tend to contain a lot of motion, and once you've copied one commit once, you may have copied it many times.²

So we can instead just let Git copy c6' and c7' again. We'll set the upstream to c3' as the first commit not to copy, and set c3 as the --onto target:

git rebase --onto

for instance. Git will enumerate the commits by walking back from HEAD (c9) until it reaches c3', which you said not to copy (nor anything earlier). That will list c6', c7', c8, and c9 as the commits to copy. The copies are to go after c3 (--onto). Note that commits c3 and c3' are both easily visible in the history and crude ASCII graph that Git draws that you can view with:

git log --graph --oneline b1 b2

so this gives you your hash IDs for the --onto and upstream arguments.

¹c9 and its abandoned history are all still there, in your Git repository, if you want them back later. They can be found via Git's reflogs. The reflog entries only last for some period of time. After 1 to 3 months by default, the reflog entries expire, and are deleted. Once that happens, the abandoned commits themselves can also be deleted for real, after which you can't get them back, at least not through your own Git.

(The details of reflogs and reflog expiration are a little complicated but not all that relevant here.)

²To view the reflog for branch b2, run git reflog b2. If you're lucky, there's not a lot of copies and not a lot of random motion and you can find c7 this way.

Undo an amendment of an older commit using a different branch

Answers (1)

TL;DR

Long

Related Questions