kdb
kdb

Reputation: 4426

Rebase branch after amending?

Is it possible to rebase onto amended commits and have git automatically skip the old versions of the commits?

Motivation: Gerrit patch-sets

With Gerrit, it is commonly necessary to amend commits after they have failed automated or human code-review. But after amending, it is also necessary to rebase the changes made since then, which is an error-prone process.

In How to git commit --amend a commit that's the base of a branch, options for manually omitting the amended commit(s) are given, but since the question doesn't ask for automation, no such solutions are suggested.

Example

Let's say I need to push a commit D for review, but have already made further changes, that aren't suitable for pushing yet.

A---B---C(origin/master)---D---E---F(devel)

>>> git push origin <hash-of-D>:refs/for/master

Now let's say the remote-build fails, or the reviewer finds an issue. Gerrit requires that the updated change is pushed is a single commit, so I need to change the commit.

For simple amendments, I can just rebase the devel branch interactively

>>> git rebase origin/master devel
edit D
pick E
pick F

A---B---C(origin/master)---D'---E---F(devel)

More generally, I may need to check the change out as a temporary branch, or I might have more than one devel branch. At that point, this option isn't available anymore. Instead I may do something like:

>>> git checkout -b amend <hash-of-D>
>>> ### Make some changes
>>> git commit --all --amend

A---B---C(origin/master)---D'(amend)
        |
        '---D---E---F(devel)

>>> git push origin HEAD:refs/for/master

Now I need to rebase, but since D and D' overlap, an automatic merge may fail, or undo changes made from D to D'. At this point a view solutions seem possible:

But at all points this depends on not accidentially mixing of commit hashes, by which unpushed commits may be accidentially removed. Additionally, occasionally there is more than one devel branch built on top of the pushed change, and they all need to be rebased. I've found that these things can take a significant amount of time due to the necessary manual error checking.

The issue becomes more convoluted, if more than one commit has been pushed for review, and one in the middle needs amending, increasing the desire for automation.

Upvotes: 1

Views: 2557

Answers (1)

torek
torek

Reputation: 490168

Use git rebase --onto. It's still a bit tricky but you don't need to be interactive and specifically tell Git drop particular commits: you choose via the command line which commits get copied, and which don't.

In fact, you've been doing this all along, because git rebase fundamentally works by copying some set of commits. That is, git rebase has three or four phases, depending on how you count:

  1. List out the commit hash IDs of commits to copy. These become pick commands when you use the interactive style rebase, so you actually already know how this part works.

  2. Use git checkout --detach (or equivalent) to go into detached HEAD mode at a particular target commit. The target commit is up to you: you tell Git, via the command line, which commit to check out here.

  3. Repeatedly run git cherry-pick on each commit to be picked. (The exact details of this particular step vary a lot depending on the style of rebase you use.)

  4. Now that the desired commits have been copied, take the branch name that we gave up in step 2—which rebase recorded in a file—and force that branch name to point to the current (HEAD) commit, and re-attach HEAD so that we are back on the branch.

If I may redraw your example a bit, you actually start with this:

A--B--C   <-- origin/master
       \
        D    [refs/for/master in the Gerrit repo at `origin`]
         \
          E--F   <-- devel

When you use --amend or some other operation, this doesn't really change D at all, as you've seen: it just makes a new commit D' whose parent is C and whose snapshot takes into account whatever updates you wanted. So now you have:

        D'  [whatever name you like can go here]
       /
A--B--C   <-- origin/master
       \
        D   [refs/for/master in the Gerrit repo at `origin`]
         \
          E--F   <-- devel

To copy E-F in an automated fashion, you need a way to name commit D. Its actual hash ID will always work, but hash IDs are big and ugly and annoying. It works way better if you insert, into your own repository, a name—any kind of name will do—that you can remember.

The "kinds" of names available are:

  • branch names: git branch makes and deletes these;
  • tag names: git tag makes and deletes these; and
  • any other name of your invention: these are a (mild) pain in the body-part, as you have to use their full names, including prefixes like refs/for/ or refs/xyzzy/. The Gerrit refs/for/ name-space is one of these inventions: it's not yours, it's Gerrit's, but it's just a whole category in which anyone can stick names, and if everyone leaves the refs/for/ to Gerrit and invents their own personal things that aren't refs/for/, they won't collide.

Of these, branch names are probably your best bet, but it's up to you. For the rest of this I'll assume you use branch names. (Tag names work fine too, and I've experimented with using these for my own use. Just be careful not to git push them by mistake as tags start cluttering up other people's repositories quickly!)

So, suppose you have:

        D'  <-- in-review/master/2
       /
A--B--C   <-- origin/master
       \
        D   <-- in-review/master/1
         \
          E--F   <-- devel

where in-review/master/number is your own personal way to remember I pushed this commit with git push origin refs/for/master. Since you've done it twice, we have two different numbers. (I invented this naming system just now for this answer, so it might be terrible. Choose one that works for you.)

When you run an interactive rebase using:

git checkout devel
git rebase -i origin/master

the commits that git rebase lists out for copying are D-E-F.

That's because it actually lists out F-E-D-C-B-Aevery commit that can be found by starting at F, the commit named via devel, and working backwards. Then, separately, it lists out C-B-A: every commit that can be found by starting at C, the commit named by origin/master, and working backwards. It knocks any commit in the second list out of the first list, leaving F-E-D, which it then reverses to the necessary order for cherry-picking.

The list of commits is:

  • those reachable from the current branch (devel), minus
  • those reachable from the upstream argument you give to git rebase: origin/master, in this case.

This finishes step 1. (In reality it's more complicated: more commits can be knocked off the list. Merge commits are by default thrown out automatically. Additional commits may be discarded via patch-ID matching and the fork-point mode of rebase. But let's just ignore all this here.)

The upstream argument also provides the target commit that Git will use in step 2, the git checkout that detaches HEAD.

If you could just tell Git:

  • use commit C as the target ...
  • but use commit D as the end of the list of commits to knock out

that would do the job, without you having to use git rebase -i and a manual edit. And it turns out, this is easy to do:

git rebase --onto in-review/master/2 in-review/master/1

The --onto argument splits out the target part from the upstream, releasing the upstream argument to mean just commits not to copy.

That's why we gave the interesting commits specific names. In your more complex scenario, you'll start with:

... if more than one commit has been pushed for review ...

In this case we will have:

...--G--H   <-- origin/master
         \
          I--J--K   <-- in-review/master/1
                 \
                  L   <-- feature/xyz

If commit J needs amending, you check out commit K and give it a new branch name in-review/master/2:

git checkout -b in-review/master/2 in-review/master/1

which gives you this:

...--G--H   <-- origin/master
         \
          I--J--K   <-- in-review/master/1, in-review/master/2 (HEAD)
                 \
                  L   <-- feature/xyz

You can now run git rebase -i origin/master and change the second commit to edit. When the rebase is all done, you may—depending on whether you also decided to edit I, and/or used --force—have:

          I'-J'-K'  <-- in-review/master/2 (HEAD)
         /
...--G--H   <-- origin/master
         \
          I--J--K   <-- in-review/master/1
                 \
                  L   <-- feature/xyz

or:

...--G--H   <-- origin/master
         \
          I--J'--K'   <-- in-review/master/2
           \
            J--K   <-- in-review/master/1
                \
                 L   <-- feature/xyz

You can now git checkout feature/xyz; git rebase --onto in-review/master/2 in-review/master/1, exactly as before.

There are cases where this technique falls down. Git rather needs a sort of multi-branch-name rebase tool, and it does not have one (and building one that serves well and isn't ridiculously hard to use, is hard, which is why nobody has done it). Consider:

...--G--H   <-- origin/master
         \
          I   <-- in-review/master/1
           \
            J    <-- in-review/feature/tall/1
             \
              K   <-- feature/short
               \
                L   <-- feature/long

You may be forced to do something about any of the various intermediate commits. Since any change to the parentage and snapshot of any commit results in copying it, if you're forced to change commit I to a new I', you must come up with new J' and K' and L' (and submit a new review for J', presumably).

Note that after copying I to I', a single git checkout feature/long; git rebase --onto in-review/master/2 in-review/master/1 copies J-K-L to J'-K'-L', but now there are three labels to move. This is the missing tool: one that moves more than one label. But this picture is too simple as you might have:

...--G--H   <-- origin/master
         \
          I   <-- in-review/master/1
           \
            J    <-- in-review/feature/tall/1
             \
              K--L   <-- feature/short
               \
                M   <-- feature/long

and now rebasing feature/long alone won't work as it will not copy L; nor will rebasing feature/short alone, as that will copy L but not M. So a multi-rebase tool needs to know:

  • which branches are interesting, and
  • where to rebase them as a group

and it must then figure out which commits to copy, build a mapping from old commit hash to new one until the group-as-a-whole has been fully copied, and only then move all the branch names to their new commit hash IDs. A merge-preservation mode (a la Git's git rebase --rebase-merges) would be the correct default mode for this tool, too, as the multi-branches here could have branch-and-merge patterns inside their subgraphs (branching and merging with each other, or independently of each other, or both).

The new rebase-merges code is most of the way to this needed tool, but it still lacks a method of specifying more than one branch name (and hence, at least potentially, multiple tip commits) and the code that would be needed to adjust the multiple branch names at the end of the entire process.

Upvotes: 5

Related Questions