Reputation: 329

How to Rebase with Multiple Stacked Branches in Git?

I'm wondering what the proper way to handle stacking of branches is in Git -- I've found that my flow breaks down after two stacks. Lets say I have the following setup:

c1 -> c2 -> c3 -> c4 //master
                   \
                    c5 - c6 //branch1
                           \
                            c7 - c8 // branch2
                                  \
                                   c9 - c10 // branch3

Lets say I decide to update branch1.

c1 -> c2 -> c3 -> c4 //master
                   \
                    c5 - c6 - c11//branch1
                           \
                            c7 - c8 // branch2
                                  \
                                   c9 - c10 // branch3

Then to update I would rebase branch2 onto branch1, and branch3 onto branch2 to ideally get the following:

c1 -> c2 -> c3 -> c4 //master
                   \
                    c5 - c6 - c11//branch1
                                \
                                 c7 - c8 // branch2
                                        \
                                         c9 - c10 // branch3

An issue I have, is that when there are merge conflicts between branch1 and branch2, and I fix them, those same merge conflicts then appear when I merge branch3 onto branch2. Actually, branch3 seems to contain the commits of branch2 for some reason, and when I rebase things get screwed up and I get a ton of merge conflicts as I'm merging later commits of branch2 into earlier commits of branch2 that for some reason live on branch3. Things thus look like this:

c1 -> c2 -> c3 -> c4 //master
                   \
                    c5 - c6 - c11//branch1
                                \
                                 c7 - c8  // branch2
                                         \
                                c7  - c8 - c9 - c10 // branch3

and the rebase turns into this:

c1 -> c2 -> c3 -> c4 //master
                   \
                    c5 - c6 - c11//branch1
                                \
                                 c7 - c8  // branch2
                                         \
                                         c7'  - c8' - c9 - c10 // branch3

What am I doing wrong here? Is there a different method of rebasing for stacked branches? Why does branch3 contain the commits of branch2?

Upvotes: 8

Answers (3)

RedX

Reputation: 15184

In 2022 and onwards the new --update-refs option of rebase can be used for this purpose.

In version Git v2.38 (released Oct 3 2022), git-rebase learned a new --update-refs option. With --update-refs, rebasing will "Automatically force-update any branches that point to commits that are being rebased" (docs).

Upvotes: 6

Greg

Reputation: 343

TL;DR use / copy the implementation of Graphite CLI

The previous answer is outdated.

"There is no good general-purpose tool to do what you want."

This open-source CLI will perform recursive branch rebases (disclosure, I'm a contributor): https://github.com/screenplaydev/graphite-cli

The main rebase-recursion can be seen here: https://github.com/screenplaydev/graphite-cli/blob/dfe4390baf9ce6aeedad0631e41c9f1bdc57ef9a/src/actions/fix.ts#L60

git rebase --onto ${parentBranch.name} ${mergeBase} ${currentBranch.name}

The key insight is to store branch parents in git refs, in order to recurse the DAG during operations. Without parent metadata, it would be impossible to always determine the merge-base of successive child branches.

const metaSha = execSync(`git hash-object -w --stdin`, {input: JSON.stringify(desc)}).toString();

execSync(`git update-ref refs/branch-metadata/${this.name} ${metaSha}`);

https://github.com/screenplaydev/graphite-cli/blob/dfe4390baf9ce6aeedad0631e41c9f1bdc57ef9a/src/wrapper-classes/branch.ts#L102-L109

Upvotes: 4

torek

Reputation: 490168

There is no good general-purpose tool to do what you want. There are specific tricks that may work for you. In particular, you will sometimes want git rebase --onto and you'll have to use it with care.

Background

The problem here is that Git branches do not nest, or stack, or whatever word you would like to use here.

More precisely, branch names, like master or branch1 through branch3, simply act like pointers or labels. Each one points to (or is pasted on to) one particular commit. They don't have any inherent relationship to each other: you can add, remove, or move any label, anywhere, any time. The only constraint on each label is that it must point to exactly one commit.

Commits are not so much on a branch as contained within some set of branches. A given pair of commits may have a parent/child relationship. In your drawings, for instance, commit c1 is the parent of commit c2. Git actually achieves this by having commits point to other commits, similar to the way branch names point to commits. There is a difference though: the content of any one commit is frozen for all time, including its pointer. What this means is that it's the child that points to the parent. The parent exists when you make the child, but not vice versa, so the child can point to the parent, but not vice versa.

(In effect, Git works backwards. You've drawn your arrows going forwards, which is backwards for Git: the children point backwards, to the parents.)

Git needs a way to find each frozen-for-all-time commit. The way is by their hash IDs: those big ugly strings of letters and digits (which is actually a 160-bit value expressed in hexadecimal). In order to point to a commit, something—a branch name, or another commit—just contains the raw hash ID of the pointed-to commit. If you have a hash ID—or if Git has one—you can have Git find the underlying object from that hash ID.¹

Git defines the branch name to contain the raw hash ID of the last commit that is to be considered part of the chain of commits. Previous commits, found by following the backwards-pointing arrows coming out of each commit, are on or contained in that branch. So—here I'll switch to my usual notation of uppercase letters for each commit—if you have:

A <-B <-C <-D   <-- master
             \
              E <-F  <-- branch

then commit F is the last commit of branch, but E, D, and so on all the way back to A are all contained in branch. Commit D is the last commit on master, but all of A-B-C-D are in master.

Note that when you first create a new branch name, it usually points to the same commit as some existing branch name:

A--B--C--D   <-- master
          \
           E--F   <-- branch1, branch2

You have Git attach its HEAD to one of these branches, and make a new commit, which gets a new hash ID. Git writes the new commit's hash ID into the branch name to which HEAD is attached:

A--B--C--D   <-- master
          \
           E--F   <-- branch1
               \
                G   <-- branch2 (HEAD)

and all the invariants still hold: branch2 contains the name (hash ID) of the last commit on that branch, branch1 contains the hash ID of its last commit, master contains the name of its last commit, and so on. No commit has changed (no part of any commit can change) but a new commit exists now, and the current branch still has HEAD attached to it, but has been dragged forward.

¹Commits, in Git, are one of four kinds of internal object types. The other three are blob, tree, and tag objects. Normally the only Git hash IDs you interact with every day—e.g., with cut-and-paste to git log or git show or git cherry-pick, or in git rebase -i instruction sheets—are commit hash IDs. Commits have a special property, which is that their contents are always unique, so that their hash IDs are also always unique. Git guarantees this by adding a date-and-time stamp to each commit. That, plus the fact that each commit holds the hash ID of its parent(s), is sufficient to produce the necessary uniqueness.

Rebase is about copying commits

As noted above, no part of any commit can ever be changed. Commits are frozen for all time. At most, you can simply stop using a commit. Git finds commits by starting with the last ones—the branch tips—and working backwards, and if you do stop using a commit, and set things up so that Git can't find it, Git will eventually delete it for real.

You can, however, take a commit out—any commit, including a historical one—and work with it and then make a new commit from this. It's probably worth a small side remark here about "detached HEAD" mode.

Let's say we have this—the same graph you drew, but using my single-letter style—with the same branch names:

A--B--C--D   <-- master
          \
           E--F   <-- branch1
               \
                G--H   <-- branch2 (HEAD)
                    \
                     I--J   <-- branch3

The normal way of working with a commit is:

We pick one by picking a branch name.
Git attaches the special name HEAD to that branch name.
That branch name is now the current branch and that commit is now the current commit.
Git copies the frozen snapshot for that commit to Git's index and your work-tree (we won't go into the details here).

We can have Git extract commit G, though, by picking it out by its name: its unique hash ID. When we do, we get a detached HEAD where HEAD itself points directly to the commit:

A--B--C--D   <-- master
          \
           E--F   <-- branch1
               \
                G   <-- HEAD
                 \
                  H   <-- branch2
                   \
                    I--J   <-- branch3

If we were to make a new commit in this state, we would in fact get one. I'll call it X rather than K since we'll just drop it and forget about it in a moment, but let's draw that result:

A--B--C--D   <-- master
          \
           E--F   <-- branch1
               \
                G--X   <-- HEAD
                 \
                  H   <-- branch2
                   \
                    I--J   <-- branch3

Note how X is ordinary in all ways except that the only name that finds it is HEAD. If we gave it a branch name, that would make the commit much more permanent: it would last until we deleted its branch name, or otherwise made the commit not-find-able.

Of course, that's not quite what you're doing. Instead, you make a new commit, which I will call K (you called it c11) on branch1 in the usual attached-HEAD way:

A--B--C--D   <-- master
          \
           E--F--K   <-- branch1 (HEAD)
               \
                G--H   <-- branch2
                    \
                     I--J   <-- branch3

At this point, you'd like to copy commits G-H-I-J to new-and-improved commits. The git rebase command can do this, as that is its job. But let's look at how it does its job.

How rebase works

Since rebase is about copying (some) commits, its work is divided up into three phases:

Phase 1 is to decide which commits to copy.

As you've seen, commits are often on many branches. The ones we want to copy are those that are on our branch, but aren't also already somewhere else. For instance, if we are on branch2 now and we say git rebase branch1, we want to copy G-H but not E-F or any of the earlier commits.

The main argument to git rebase is what the documentation calls the upstream. Here, that's branch1. The commits to copy are those reachable from our current branch—from HEAD or branch2; both select the same set of commits—minus those reachable from the name branch1. So rebase first lists all the commits on our current branch, but then knocks out of the list of commits to copy, all those that are on the target/upstream. This list ends up holding the raw hash IDs of the original commits.

The git rebase documentation describes this listing as:

All changes made by commits in the current branch but that are not in <upstream> are saved to a temporary area. This is the same set of commits that would be shown by git log <upstream>..HEAD; or by git log 'fork_point'..HEAD, if --fork-point is active (see the description on --fork-point below); or by git log HEAD, if the --root option is specified.

This is, in fact, not the complete picture, but it's a good start. We'll get to the more complete picture in the next section.
Phase 2 is about actually copying the commits. Git uses git cherry-pick, or something mostly equivalent,² to do the copying. We'll skip right over how cherry-pick works, except to mention that, as you have seen, it can get merge conflicts.

What we will note here is that the copying takes place in detached HEAD mode. Git first does a detached-HEAD style checkout of the target commit. Here, since we said git rebase branch1, the target is commit K, so the copying starts with:
```
A--B--C--D   <-- master
          \
           E--F--K   <-- branch1, HEAD
               \
                G--H   <-- branch2
                    \
                     I--J   <-- branch3
```
with Git remembering the name branch2 (in a file: if you poke around inside the .git directory during a partial rebase, you'll find a directory full of rebase state).

The list of commits to copy at this point is commits G and H, in that order, and using their real hash IDs, whatever those really are. Git copies these commits, one at a time, to new commits whose snapshots and parents are slightly different from the originals. That gives us this new set of commits, still in detached-HEAD mode:
```
A--B--C--D  ...    G'-H'  <-- HEAD
          \       /
           E--F--K   <-- branch1
               \
                G--H   <-- branch2
                    \
                     I--J   <-- branch3
```
The last phase of git rebase is to yank the branch name over.

Git fishes out the saved branch name, forces it to point to the current (HEAD) commit—in this case H'—and re-attaches HEAD. So now you have:
```
A--B--C--D  ...    G'-H'  <-- branch2 (HEAD)
          \       /
           E--F--K   <-- branch1
               \
                G--H
                    \
                     I--J   <-- branch3
```

Note that there is, at this point, no name selecting commit H any more.³ We could straighten out the kink in the graph drawing, but I left it in for symmetry, and for another reason we'll see in a later section.

²Rebase can use one of several "back ends". The default non-interactive back end has been git-rebase--am up until Git 2.26.0, but it isn't any more. The am back-end uses git format-patch and git am, hence the name. It misses certain file-rename cases, and is incapable of copying an empty-diff commit, but it can be a lot faster in some relatively rare rebase cases.

³Actually, there is at least one reflog entry, at least in a default setup. We'll get to that later.

A better idea of what rebase copies

I mentioned above that in phase 1, when rebase lists out the commits to copy, it doesn't really use the <upstream>..HEAD method. The documentation even has caveats here (about fork-point mode) but it does not have enough caveats.

Whenever you have Git copy commits—whether by running git cherry-pick yourself, or any other method including rebasing—you end up with commits that may "do the same thing" as each other. That is, given commits H and H', we could run:

git show <hash-of-H>

to view a diff between commit G and commit H, to see what H does. We could run:

git show <hash-of-H'>

to view a diff between commit G' and commit H', to see what H' does.

If we strip out the line numbers in this diff listing, we'll get the same changes.³ Git includes a command, git patch-id, that reads a diff listing, strips off the line numbers—and some white-space as well, so that, e.g., trailing white space doesn't affect things—and hashes the result. This produces what Git calls a patch ID.

Unlike a commit's hash ID, which is guaranteed to be unique to that one particular commit—so that our cherry-picked copy is a different commit—the patch-ID is deliberately the same if the commit "does the same thing". So:

git show <hash-of-either-H-or-H'> | git patch-id

will show that H and H' are "the same" commit, in a sense.

When you run git rebase, Git will actually compute the hash IDs of a bunch of commits. For those that are "the same commit", Git will knock those commits out of the list of commits-to-copy.

(By default, rebase also knocks all merge commits out of the list. You don't have any, in these examples, so we don't have to worry about these here.)

Hence if we now run:

git checkout branch3; git rebase branch2

Git will take this graph:

A--B--C--D  ...    G'-H'  <-- branch2
          \       /
           E--F--K   <-- branch1
               \
                G--H--I--J   <-- branch3 (HEAD)

and list commits A-B-C-D-E-F-G-H-I-J as the branch3 list, but then knock out A-B-C-D-E-F-K-G'-H' because that's the branch2 list. That leaves G-H-I-J as the starting point before doing the patch-ID part. In other words:

branch2..HEAD

is G-H-I-J.

But now, Git computes a patch ID for G, H, I, and J. It then also computes patch IDs for K, G', and H'.⁴ The rebase code finds that G already has a patch-ID equivalent commit, G', in the upstream. So G' gets knocked out of the list. Then it finds that H has H' upstream too, so H gets knocked out of the list.

The final list of commits to copy at this point is I-J: just what you wanted. Git can now detach HEAD at commit H' and copy I-J, and then re-attach HEAD to the result:

                        I'-J'  <-- branch3 (HEAD)
                       /
A--B--C--D  ...    G'-H'  <-- branch2
          \       /
           E--F--K   <-- branch1
               \
                G--H--I--J   [abandoned]

³More precisely, we'll usually get the same changes. We sometimes won't get the same changes, if we had a merge conflict during the cherry-pick.

⁴The reason for this particular list is that these are the commits produced by git rev-list branch2...HEAD. Note the three dots here: this is Git's syntax for a symmetric difference set operation. This symmetric difference consists of commits reachable from HEAD but not branch2, plus commits reachable from branch2 but not HEAD. One set becomes the "left side" commits and one set becomes the "right side" commits. The commits-to-copy are the left-side G-H-I-J, and all get patch-ID-ed; the commits in the upstream that also get patch-ID-ed are the right-side list.

Where this goes wrong

Footnote 3 (above) is the clue to where this goes wrong. If, during conflict resolution, you wind up changing some commit in some substantive way, the patch-ID computations no longer work to knock out some commits.

When you go to rebase branch3, this time, Git chooses to copy G to G' again and/or copy H to H' again. Each copy is nearly guaranteed to collide (as in merge-conflict) with the copy already present on the ongoing build of the new replacement commits.

The correct action is to omit G and H in the copying process. Rebase would have done that for you, using the patch-ID trick, except that the patch-ID trick failed.

Using `--onto`

In your case, you want rebase to copy some commits but not all commits in the <upstream>..HEAD range while putting the copies at the right point. You have:

A--B--C--D  ...    G'-H'  <-- branch2
          \       /
           E--F--K   <-- branch1
               \
                G--H--I--J   <-- branch3 (HEAD)

and you'd like to tell rebase: Copy I and J but not H and therefore not G. Put the copies after H' at the tip of branch2.

One argument won't do the job, but two would. Suppose you could say:

git rebase --dont <hash-of-H> --onto branch2    # not the actual syntax

for instance? Fortunately, git rebase has this built in. The actual syntax is:

git rebase --onto branch2 <hash-of-H>

The --onto argument lets you specify the target of the copies, freeing up the upstream argument to mean what not to copy.

Rebase will still do all the same patch-ID work, but by starting it with the list G-H, it doesn't have a chance to get it wrong. The end result is just what you want.

Using the reflog, or other tricks, to find `H`

The annoying part here is finding H's hash ID. With these diagrams, I can blithely say <hash-of-H>, but in a real rebase, with real graphs and dozens of commits that all look alike, finding hash IDs is a pain in the butt. If only there were an easy way to get this right.

As it turns out, there is.

Whenever Git moves a branch name, the way git rebase does for instance, it leaves a trail of previous values. This trail goes into Git's reflogs. There is a reflog for each branch name, plus one for HEAD. The HEAD one is very active and not as useful here because it's too active, but the one for branch2 is perfect.

Remember how we drew:

A--B--C--D  ...    G'-H'  <-- branch2 (HEAD)
          \       /
           E--F--K   <-- branch1
               \
                G--H
                    \
                     I--J   <-- branch3

originally. I said I left it in for symmetry and another reason, and now it is time for the reason. We can use the name branch2@{1} to refer to the reflog entry for "where branch2 was one step / branch2-change ago". As long as "one step ago" was just before rebasing, that means "commit H". So:

git checkout branch3
git rebase --onto branch2 branch2@{1}

does the trick.

If you have done things in branch2 since your rebase—e.g., if you built and tested and committed—you might need a higher number than @{1}. Use git reflog branch2 to print out the actual reflog contents, to check.

Another alternative is to drop a branch or tag name pointing to commit H before you rebase branch2 at all. For instance, if you make a new name branch2-old or branch2.0 or whatever, you'll still have:

A--B--C--D  ...    G'-H'  <-- branch2
          \       /
           E--F--K   <-- branch1
               \
                G--H   <-- branch2-old
                    \
                     I--J   <-- branch3

(regardless of where HEAD is now). You can mark commit J as branch3-old before you start its rebase, too.

(The reflogs are convenient and normally work fine. Branch names are cheap, though.)

Consider also doing the one-fell-swoop rebase

Suppose you have this graph:

A--B--C--D   <-- master
          \
           E--F--U   <-- branch1
               \
                G--H   <-- branch2
                    \
                    ...
                      \
                       T   <-- branch9

where U is the new commit you'd like to have in all branchN ancestries. If you run:

git checkout branch9; git rebase branch1

you'll get copies of commits G-H-...--T, all in one operation. You can now take branch2, branch3, ..., up through branch8 and just move each one to point to the appropriate copied commit. Matching up the original commits with their copies is a job for a tool, but unfortunately, that tool does not exist. So if you go this way, it's kind of manual.

Also, be aware that this doesn't work for some cases:

A--B--C--D   <-- master
          \
           E--F--K   <-- branch1
               \
                G--H--L   <-- branch2
                    \
                     I--J   <-- branch3

Rebasing branch3 onto branch1 copies only G-H-I-J, not L. So you may still need the occasional git rebase --onto as well. (A proper tool would do all of this.)

Upvotes: 7