Setjmp
Setjmp

Reputation: 28392

Pull with rebase up to a specific commit

This is a variation on an older question with a rebase twist.

I want to do a git pull --rebase but only till a specific commit. This is not pulling a specific commit, this is pulling upto a specific commit. The remote master looks like the following.

A<-B<-C<-D<-E<-F<-HEAD (Remote master HEAD)

Suppose my local feature branch HEAD points to G which points to D:

A<-B<-C<-D<-G<-HEAD (Current local feature branch HEAD).

I want to pull up to E with a rebase so that my branch ends up looking like:

A<-B<-C<-D<-E<-G<-HEAD (local feature branch end goal).

However, this is just a special case. I want to pick any eligible commit hash, not just the second to last one as in the example above.

Naturally, I would like the hash for commit E to match remote master at the end of the operation. I belabor that point because certain types of interactive rebase editing would cause that property to disappear.

What should I do?

Upvotes: 1

Views: 5146

Answers (3)

torek
torek

Reputation: 489748

TL;DR

Calum Halpin's answer is correct: you must break your git pull into its two separate constituent commands: git fetch followed by git rebase. It probably deserves more explanation though.

Long

Let me redraw your graphs a bit to make them look (I think anyway) a bit clearer. I'm going to replace the internal backwards-pointing arrows with simpler connecting dashes, though, because I need to draw some "arrows" that point up-and-left or down-and-left, and this doesn't work very well here. (I have arrow-drawing characters on my system that don't always show in on other Web browsers on other systems.)

On the remote we have:

A--B--C--D--E--F   <-- master

That is, there is a chain of commits, with some big ugly hash IDs that we've replaced with letters for convenience, that ends at commit F. Their Git's name master contains the hash ID of commit F. (The name master may or may not be the current branch on the remote: it does not matter for our purposes, so there is no need for us to draw in the special name HEAD here.)

Meanwhile we have this locally:

A--B--C--D--G   <-- feature (HEAD)

i.e., we and they share commits A through D, including their linkage, and our Git's name feature contains the hash ID of commit G which points back to D.

I want to pull up to E with a rebase so that my branch ends up looking like:

A--B--C--D--E--G   <-- feature (HEAD)

However, this is just a special case. I want to pick any eligible commit hash, not just the second to last one as in the example above.

What you need to do is avoid the git pull command.

What git pull does is run two more-basic Git commands: first git fetch, then a second command of your choice. The second command is usually git merge but if you use --rebase, or various other configuration methods, you can have it run git rebase instead. What you cannot do is pass the right arguments to git rebase, where by "right arguments" I mean the ones that solve the problem in question.

Let's run git fetch first. This has our Git call up their Git. Their Git tells our Git about its various branch and tag and other such names, including the fact that their master identifies commit F. Our Git checks out repository and discovers that we don't have commit F, so our Git asks to get it. Their Git then offers commit E as well—the sender has to offer everything we need, and we'll need E to hold F—and our Git asks for that too. They offer D, but we already have that, so we tell them to stop there.

They now build a so-called thin pack that contains what we need to add commits E and F to our repository. This is the "counting objects and "compressing objects" stuff you see when you run git fetch, or any command that runs git fetch. They send us the thin pack; our Git takes that thin pack and fixes it up so that it's usable, and now we have the commits they had, that we didn't, that we need to finish git fetch-ing.

If there's nothing else we need, our Git says 'kthanksbye to their Git and proceeds to update our remote-tracking names. We have, in our Git, names origin/master and so on: these are where our Git remembers what their Git said their hash IDs were, for their branch names, the last time we talked with them. As long as we have the commits that their branches remember, our Git can update our remote-tracking names accordingly, so it does. This leaves us with:

A--B--C--D--G   <-- feature (HEAD)
          \
           E--F   <-- origin/master

(There's that up-and-left pointing arrow, going from E to D, drawn without using arrow characters.)

We're now ready to run git rebase.

If we just ran git rebase like git pull does

If we ran git rebase origin/mastergit pull actually uses `git rebase but this does the same thing—Git would:

  • list out commits reachable from our current HEAD: G, D, C, and so on;
  • list out commits reachable from origin/master: F, E, D, C, and so on;
  • remove from the first list anything in the second list;
  • remove any additional commits that should not be copied;1
  • put the list in the right order (reversed from Git's internal backwards order); and
  • begin copying those commits, one by one, as if by git cherry-pick.2

Since this list consists only of commit G, that's the one commit we'll copy. But: Where does this copy go?

A regular git rebase places the copy, or if there are multiple commits to copy, copies plural, right after the commit named on the command line. Since git rebase as run by git pull names commit F—the commit to which our origin/master points, after git fetch updated our origin/master to match origin's master—we'll get:

A--B--C--D--G
          \
           E--F   <-- origin/master
               \
                G'

as the result. (I've removed some of the names because for this moment, the names aren't really interesting, and git rebase will be fiddling with them too; we just haven't drawn that part yet.)

Had there been more commits, e.g., G-H-I, we would end up with G'-H'-I' on the bottom row here. In all cases, the original commits, pre-copying, remain. At this point, git rebase finishes its work by moving the name to which our HEAD is attached to point to the final copied commit:3

A--B--C--D--G   [was `feature`, now abandoned]
          \
           E--F   <-- origin/master
               \
                G'   <-- feature (HEAD)

1Depending on argument to git rebase, this normally includes all merge commits, plus any commits for which git patch-id computes identical patch IDs. The patch-ID computation part is a little tricky to describe: it involves using git rev-list --left-right with the symmetric difference triple-dot operator. For many rebases, neither of these matters, which is why I merely have this footnote.

2Some types of git rebase literally run git cherry-pick. Others—including the default one and the one run by git pull—use a faster but cheesier mechanism involving git format-patch and git am that can miss rename operations. You can force rebase to use the slower but more-accurate cherry-pick method by adding -m, or doing an interactive rebase, or adding other options. There's rarely any real need for this though.

3Technically, Git ran each of the copying operations with a detached HEAD, in which HEAD points directly to the copy once the copy is made. But of course git rebase starts by saving the fact of the attachment—the fact that HEAD was attached to feature—so that when the rebase finishes, Git knows to (1) move feature and (2) re-attach HEAD.


What you want to do instead: specify where the copies go

If you run git rebase yourself, you get to pick the argument that git rebase calls upstream. When git pull does it, it gives git rebase the hash ID to which your updated origin/master points. If you do it with:

git rebase origin/master

you give it the name origin/master, which it resolves to that same hash ID, and you get the result we saw.

But if you do run this manually, you can put in the hash ID, or anything else that names the commit you want. This tells git rebase where the copies go.

In this case, then, if you name commit E by any means–raw hash ID, or origin/master^, or origin/master~ will al work—your git rebase will copy G to a G' that comes after E:

A--B--C--D--G   [was `feature`, now abandoned]
          \
           E--F   <-- origin/master
            \
             G'   <-- feature (HEAD)

and you get the desired result.

There is one more control knob available to you now too

When you run git rebase by hand, instead of having git pull do it for you, you get one more option. Look again at the bullet-point list of steps above, where git rebase figures out which commits to copy. If you run:

git rebase <upstream>

Git lists out commits as if by:4

git log <upstream>..HEAD

(as shown in the git rebase documentation; add --fork-point as needed, and see footnote 4).

It then copies the listed commits, using upstream as the target for the copying. But what if you have, e.g.:

...--B--C--D--E--F   <-- branch (HEAD)
         \
          G--H--I   <-- origin/master

where commit D was an emergency hack fix you made so that could write commits E and F, and meanwhile someone wrote the real fix as commit G, H, and/or I?

While git rebase tries to be smart about omitting commits that are already in the upstream—e.g., if commit G matches commit D, git rebase knows not to copy D—this does not work in all cases. In particular, it usually won't work to drop an emergency "fix" that actually just disabled or eliminated a feature, rather than really fixing the bug in the feature.

You could use git rebase -i to handle this, but long before git rebase -i, there was git rebase --onto. With --onto, you get to split off the target-selection from the upstream-limiting argument.

That is, in this diagram, what we want as our result is to copy only commits E and F, leaving D—our emergency fix that isn't really right—behind. To tell Git this, we use git rebase --onto:

git rebase --onto origin/master <hash-of-D>

or:

git rebase --onto origin/master branch~2

Our upstream argument now names commit D. That's the commit not to copy (nor any of the earlier commits).

If we ran a git rebase like this but without the --onto argument, Git (a) would not copy D but then (b) would place the copies of E and F right after D. The result is what we don't want (draw it to see). But when we add --onto origin/master, that tells rebase to put the copies after commit I. The result is:

...--B--C--D--E--F   [abandoned]
         \
          G--H--I   <-- origin/master
                 \
                  E'-F'  <-- branch (HEAD)

Commits D-E-F are all dropped, in favor of the new and improved E'-F' commits. We don't have to manually drop D as our git rebase arguments did that for us. Redrawing the chain with the abandoned commits invisible gives us:

...--B--C--G--H--I   <-- origin/master
                  \
                   E'-F'  <-- branch (HEAD)

and if no one but us ever knew about commits E and F, we can just pretend that we only wrote the new copies, rather than the originals: no one (except us) will ever know.5


4Try this yourself! You'll get a list of hash IDs, one per line. They come out in Git's preferred order though—backwards—which isn't suitable for git rebase, and they don't omit the commits that git rebase will omit. Still, rebase actually does use git rev-list internally, it's just that it adds a lot of options: --no-merges to drop merges commits, and --topo-order --reverse to get the right order. Last, there is a bit of magic to exclude commits that have the same patch-IDs as commits on the upstream side, as noted in footnote 1. This involves using the three-dot syntax, <upstream>...HEAD, and adding --right-only --cherry-pick. When rebase was a shell script, this was easy to find; now that it has been rewritten in C code, it's much harder to figure out.

When the fork-point option is in effect, the <upstream> parameter here gets replaced by the result of git merge-base --fork-point, which uses your origin/master reflog to guess if some commits should be omitted. See, e.g., Revision selection in git using fork-point and What does `git rebase --fork-point master` mean?.

I'm still somewhat unconvinced that fork-point mode is the right default (it can be surprising sometimes) and am not yet sure whether the new Git 2.24 --keep-base option uses the fork-point type merge base, or the real merge base. But do note that if you rebase using anything that is not a name—e.g., if your rebase upstream argument is a hash ID—that disables fork-point mode, as the fork-point base is computed by scanning a reflog, and only names have reflogs.

5And we'll probably forget. Who can remember raw hash IDs?


Conclusions

  • Git is really all about commits. Branch names, when you use them—and when Git uses them—are just there to help you find the last commit in some branch. Commits are frozen forever, and (mostly) permanent (if you cannot find them, from branch or other names, they do eventually go away).

  • Branch names move. Branch names let us, and Git, find commits. They move in predictable ways, by adding commits to the branch. Some operations, such as git rebase or git reset, move them in big sudden ways and maybe in ways that are "less natural", not just advancing to incorporate more commits.

  • git rebase is about copying commits. We make new-and-improved copies, wth different hash IDs, and make a branch name point to the last copied commit. The originals can't be changed, but if you move a branch name, anyone who didn't save the hash IDs of the originals won't be able to find the originals.

  • git fetch is about two things: getting new commits from another Git and updating remote-tracking names like origin/master based on what is in that other Git. If we did get new commits, all we have to remember them at this point is these remote-tracking names, so we generally need a second command.

  • git pull is meant to be convenient: it runs git fetch, then it runs a second command, usually git merge.

  • Sometimes, this is not convenient. I actually find that it's inconvenient more often than it is convenient. In that case, just don't use it.

Upvotes: 2

udit rawat
udit rawat

Reputation: 221

try interactive rebase:

git rebase -i e3f8704

e3f8704 is your commit hash code.

Upvotes: -1

Calum Halpin
Calum Halpin

Reputation: 2105

Fetch the changes from the remote:

git fetch origin

Rebase onto the remote version of master, ignoring some number of commits:

git rebase origin/master~<n>

where <n> is the number of commits from the tip of master you want to ignore.

If you have the id of the commit you want to rebase onto, you can use that instead:

git rebase <commit-id>

Upvotes: 3

Related Questions