How to resolve commit history on remote repo after rebasing?

Question

I am fairly new to git and I started poking around with a friend on a project together a few weeks ago. Since then, I sat down and actually read the official docs and multiple articles, and have a much better understanding of how git works. I started looking at the commit history and realized that it was inconsistent, so I decided to fix it using interactive rebasing.

A (simplified example) before the rebase:

A--B--C--H--I  = local/main, remote/main
       \
        F--G   = local/feature, remote/feature

In this example, there was something wrong with commit B, so I did an interactive rebase, which resulted in the following:

A--J--K--L--M  = local/main
 \
  B--C--H--I   = remote/main
      \
       F--G    = local/feature, remote/feature

where J used to be B, K used to be C, etc...

I have read about how git push --force-with-lease can be used to push upstream and force the local/main branch to become the new remote/main, but would it safely keep the remote/feature branch?

Would changing the parent of commit F to commit I be a viable solution? What is the best approach to this situation?

EDIT: Removed arrows and changed wrongly named commits for clarity.

torek · Accepted Answer

Yes—but this isn't phrased correctly. In particular, you literally cannot change the parent of commit F. You can only make a new and improved, but different commit, that's like F but has a different parent.

This is the same thing you did when you improved B into J. You may want to do this in almost exactly the same way, or at least, using git rebase. There's an easier way though, using git rebase --onto.

Long

You've also drawn your commits sensibly. But Git isn't sensible: it works backwards. Here is how I would draw the original commits (keeping your lettering):

A--B--C--H--I   <-- main
       \
        F--G   <-- feature

The internal arrows point backwards, from I to H, then to C, and so on. Some text arrows ← sometimes show up OK on some computers, and B <-C works pretty well, but after making the point that Git works backwards, I like to just use dashes. At least that way I'm not implying that Git works forwards. :-)

The branch names themselves simply point to one commit, namely the last one. This is Git's definition of a branch name: whatever commit it points to, that's the last one on that branch. If some new name (fred perhaps) points to commit C—we can add this name whenever we like, in Git—then commit C is now the last commit on that new branch. We can then remove the name and C remains unchanged; we are just no longer claiming it to be the last commit on branch fred. Commit C remains on main and feature all along, as any commit is on any branch where we can start with the name, find the last commit, and from there, work backwards and arrive at the commit.

(Hence H-I are exclusive to main, and F-G are exclusive to feature, at least right now. If we add a new merge commit, main could gain access to F-G that way, for instance:

A--B--C--H--I--M   <-- main
       \      /
        F----G   <-- feature

Commit M, being a merge, points backwards to both I and G. Let's not add M, though.)

What is the best approach to this situation?

Best is a very tricky word.

If other people are actively using these same commits—copied into their repositories—then it's probably best not to improve the old commits. But if these are sophisticated Git users and are aware that the old commits might be improved like this, then the rebase you already did is fine.

When we use rebase to make new-and-improved copies of commits, the new commits are entirely different commits—with random-looking hash IDs as usual—but it's a good idea for us, as humans, to remember that they are in fact new and improved copies. So rather than giving them entirely new letters, I like to use the prime symbol, H' for instance, to indicate that the new copy is a copy of original commit H. So I would draw your rebase like this:

  B'-C'-H'-I'  <-- main
 /
A--B--C--H--I   <-- origin/main    ["main" in a Git you call "origin"]
       \
        F--G   <-- feature, origin/feature

Let me pop back to this part first:

I have read about how git push --force-with-lease can be used to push upstream and force the local/main branch to become the new remote/main, but would it safely keep the remote/feature branch?

All git push operations follow the same overall strategy. Adding force or force-with-lease just modifies the last step a bit.

First, your Git dials up another Git, usually over the Internet (although if you have two Git repositories accessible directly on one computer, you can short-circuit this a bit). The other Git manages their repository, which has its own branch names. Your Git and their Git need not use the same names—and in complicated cases, you'll find that you have to change at least one name, sometimes—but for human sanity it's usually best to use the same name on both sides as much as possible.

What's shared are not the branch names, but rather the commits. They have some set of commits, identified by hash IDs. You have some set of commits, identified by hash IDs. Your Git and their Git have conversations that—while they use the branch names here and there to find the commits—revolve, at this point, around the hash IDs. Your Git, in this case, has some commits that they lack, that you want them to have. So your Git hands over those commits. They now have this:

  B'-C'-H'-I'  [no names yet]
 /
A--B--C--H--I   <-- main
       \
        F--G   <-- feature

Now we get to the point where, in git push, the names get properly involved. The push session is almost done, but your Git now asks or commands them to make changes to their branch names, such as: Please, if it's OK, make your main point to commit I' or Make your main point to commit I'!

The first of these is a regular push. They will say: But it's not OK! If I do that, I'll lose access to B-C-H-I through my main! Except they're not this verbose or precise: they just say that's not a fast-forward operation. The not-a-fast-forward is just a concise way of saying that they'd lose some commits.

The second one is a forceful command, which you get with plain --force. If they obey, they give up their commits in favor of your new and improved ones. The one problem with this is that, while you think their chain of commits ends at I, maybe they've added more commits since then, or otherwise changed their main around somehow.

The --force-with-lease option is a compromise: your Git says to theirs, I think your main identifies commit I. If I'm right, I command you to switch to I' instead! If I'm wrong, let me know. That way you can be sure that if they obey this command, you haven't made them lose anything useful. (They can still choose whether to obey the command at all, just as they can with plain --force.)

Anyway, let's say that you send an appropriate command and they obey. This last step, of adjusting their name main, has no effect on any commit. It can't: all commits are literally read-only. The branch names can have their stored hash IDs replaced, but the commits can never be changed.

The result of all this is, in their repository:

  B'-C'-H'-I'  <-- main
 /
A--B--C--H--I   [abandoned]
       \
        F--G   <-- feature

which is the same set of commits and branch names that you have in your repository.

If they do obey this, your Git will update your origin/main remote-tracking name, so that your repository remembers that their main now remembers commit I'. (This assumes you have Git 1.8.2 or later—truly ancient Git versions failed to update the remote-tracking name at this point, requiring a separate git fetch to do it. That was intentional, but was a misfeature, and eventually called a bug and then fixed.)

Now, before you do this at all, you might want to snag the hash ID of commit I and save it somewhere, or do your rebase --onto now, because here's where it gets a bit tricky.

Rebase in general works by copying commits

As you've already seen, using your git rebase -i method, what you end up with is copies of original commits. The git rebase command is, in effect, a copy some commits, then change one branch name around operation.

When using git rebase -i, Git puts up an instruction sheet containing pick commands. (If you get fancy with -r or --autosquash, it may contain other commands as well, but let's ignore this here.) Each pick command tells rebase to invoke git cherry-pick. Cherry-pick is the "copy a commit" command. So this instruction sheet is really just showing—and allowing you to change!—the set of copies that Git is about to make. These commits aren't made yet, so they're not set in stone. (The set-in-stone part happens when the commit's hash ID is computed, during the make-a-commit-from-data phase.)

Rebase's job, then, is this:

make a list of commit hash IDs to copy, in the right order;
do a detached HEAD checkout / switch, to get in place to make copies;
make the copies, one by one; and
move the branch name to point to the last copied commit.

The branch name to move is the one you have out at the time you run git rebase. You can therefore run git checkout feature or git switch feature and then start the rebase.

But—here's the tricky part—which commits is Git going to copy, and where are the copies going? This is where the argument to git rebase comes in. If you run:

git rebase main

the copies will go after the commit to which main points, which is commit I'. That's not the right place.

Let's draw what we have here now, again:

  B'-C'-H'-I'  <-- main
 /
A--B--C--H--I   <-- origin/main    ["main" in a Git you call "origin"]
       \
        F--G   <-- feature, origin/feature

What we want is to copy F and G and place the copies right after C'.

The (single) argument to git rebase here is where the copies go. You need to put in the raw hash ID of existing commit C' here, or something that locates this. You can use git log to locate the correct hash ID, and use cut-and-paste with the mouse to put it in.

Or, you can use Git's relative commit trick. Commit C' is two steps back from the commit to which main points. So git rebase main~2 picks C' as the place where the copies go.

There's another problem though: which commits will Git copy? The answer here is that if you use this kind of rebase, Git picks the commits to copy using the one argument you give, e.g., main~2 or the raw hash ID of commit C'. Git finds the commits to copy by starting from wherever you are at the moment—your HEAD commit, not that I have been drawing in HEAD—and working backwards. Since you'll be on commit G, Git will work backwards from G. The complete list here is G, F, C, B, and A. Some of these will get knocked out, though.

The list of commits not to copy are those that Git reaches from commit C' (more generally, the target commit). So Git starts at C' and works backwards too: C', then B', then A. The C' and B' commits are not in the to-copy list, so these are not interesting, but A is: A gets knocked out of the to-copy list.

We have a problem, in other words. Git wants to copy all of B-C-F-G:

You could use git rebase -i here. Git would put up an instruction sheet with four pick commands, with each of those commits' hash IDs. You can then delete two of them, leaving Git with instructions to copy only F and G (in that order).
Or, you can use git rebase --onto.

With the --onto argument, we tell Git, separately, that we want the copies to go after whatever we provide with --onto. That frees up the remaining argument—the copy-limiter—so that we can use something else here.

What we want is to have Git copy only those commits not reachable from origin/main. So:

git rebase --onto main~2 origin/main

will make Git list out, as "potentially to copy", everything it listed before (A-B-C-F-G, once they're reversed into "backwards-for-Git" = forwards order), but this time, the "knock these out" list goes I-H-C-B-A. That knocks out A, B, and C, leaving just F and G in the to-copy list.

Our rebase will now copy F and G, to new and improved commits F' and G', whose improvement is to have them come after C' and use C' as their starting snapshots. Once the copies are done, Git will yank the name feature over and re-attach HEAD (which I'll draw in, this time). The result is this:

       F'-G'  <-- feature (HEAD)
      /
  B'-C'-H'-I'  <-- main
 /
A--B--C--H--I   <-- origin/main
       \
        F--G   <-- origin/feature

We can now git push --force-with-lease both main and feature, in fact in one single git push:

git push --force-with-lease --atomic main feature

The --atomic here is a bit of a frill, and if your remote doesn't support it, has to be taken out. Either way your Git will have their Git check to make sure that their main and feature point to the commits our Git is remembering with its remote-tracking names. The --atomic tells the other Git: only update any of your names if you're going to update all the names. So if either lease check failed, their Git would refuse the entire push, rather than accepting one of the two.

(The --atomic feature was new in Git 2.4. There was a bug fixed in 2.24.1 where it did not always work when using smart http.)

Once they accept the updates, your own local Git will show:

        F'-G'  <-- feature (HEAD), origin/feature
       /
A--B'-C'-H'-I'  <-- main, origin/main

There's no reason to draw A on its own line any more, so I didn't. The abandoned original commits will live in in your own repository for at least 30 days from the point of abandonment, by default, but they may vanish quickly from the origin repository. In any case they're hard to find, unless you know the hash IDs: they seem to be gone.

How to resolve commit history on remote repo after rebasing?

Answers (2)

Long

What is the best approach to this situation?

Rebase in general works by copying commits

Related Questions