Reputation: 6242

Difference between two git rebasing methods

I would like to perform updates from master into my local branch which has been branched out of master earlier on the time axis (from M2 change). Master is denoted with M changes, my local branch with L changes

New branch was created from master's M2:

 M1->M2-->M3->M4
      \
       L1->L2

I guess my outcome should be for my local branch as follows:

M1->M2->M3->M4->L1->L2

Which means recreating my local branch to have first all master changes and only then my local branch changes on top of it, like mentioned in:
https://www.atlassian.com/git/tutorials/merging-vs-rebasing (correct me if I'm wrong)

My question is whether one of the following methods is not creating the above desired flow, and if so why?

git checkout master
git pull --rebase 
git checkout branch_to_update
git rebase master` (method mentioned in attlasian tautorial)

git checkout branch_to_update
git pull --rebase origin master

Upvotes: 2

Answers (3)

torek

Reputation: 490068

As in both other answers, the effect is generally pretty much the same. msanford has pointed out one definite and one potential difference but there are more. To see what and why, we should disassemble git pull into its constituents.

All the finicky details (warning: long)

With a few minor exceptions (such as running it in a completely empty repository), git pull means:

run git fetch with various options and arguments; then
run a second Git command, chosen before step 1 runs, also with various options and arguments.

The second command is usually git merge but you can tell Git to use git rebase. The options and arguments passed to the two commands depend on options passed to git pull and other configuration settings, plus the result or results of the fetch in step 1.

As a sort of general rule, though, arguments passed to git pull are passed to git fetch, so this means that your second command sequence—which passes origin master to git pull—passes origin master to git fetch as well. If you run git pull without these arguments, as in your first command sequence, Git extracts the remote (usually origin) and the upstream branch name (usually the same as the current branch name) from your configuration, specifically from the results of these two commands:¹

git config --get branch.$branch.remote
git config --get branch.$branch.merge

(where $branch is the current branch). If the current branch is master, this uses branch.master.remote as the remote. That's what we mean in terms of assuming there is only one remote. The merge name is probably master, but if not, that's another assumption we have to make, before we can claim that these do the same thing.

¹if your Git is old enough, git pull is a shell script, and it literally runs various other Git commands. If it's newer, git pull has been converted to a C-language program and it has these built directly in.

Rebase copies commits, then switches to the new copies

What git rebase does gets complicated if we delve into all the details, but at a high level, its job is to copy commits. To see which commits it will copy, you should draw the commit graph, or use git log --graph to have Git draw it for you. (Some GUIs always draw it, and some web interfaces _{^{*cough*GitHub*cough*}} never let you view it!) With a graph drawing, it's easy—well, sometimes easy—to tell which commits get copied:

...--A--B--C--D   <-- master
         \
          E--F--G   <-- br

Rebasing your branch br on your master copies three commits, here E through G, placing the copies after commit D. This is similar to what you drew.

Suppose we add origin/ remote-tracking names, and show that your own master is currently pointing to commit B while origin/master is currently pointing to commit D, like this:

          C--D   <-- origin/master
         /
...--A--B   <-- master
         \
          E--F--G   <-- br

Now we can see that we must rebase br onto origin/master in order to have the copies go after commit D. Rebasing onto master will put the copies after B, which is where the originals are, so there's no need to copy after all. (Whether rebase actually copies, or just re-uses the originals, is one of the finicky details: it depends on the -f option, for instance.)

Once the copying is done, git rebase simply re-points the branch name to point to the final copied (or re-used) commit, which we can call G' here to note that it's a copy of G. The original commits are effectively abandoned, although reflog entries for HEAD and for the original branch, and the name ORIG_HEAD, temporarily retain them:

               E'-F'-G'  <-- br
              /
          C--D   <-- origin/master
         /
...--A--B   <-- master
         \
          E--F--G   [abandoned, but see ORIG_HEAD and reflogs]

The reflog entries keep the originals available for at least 30 more days by default. Eventually ORIG_HEAD moves elsewhere due to other operations, and the reflog entries expire, and the original commits get garbage-collected.

Now we can look at your original command sequences

Let's suppose, for argument sake, that we have the graph above (like yours but with one more commit on branch br, and we already ran git fetch to get origin/master updated). Then the Atlassian command sequence begins with these two commands:

git checkout master
git pull --rebase

This will attach our HEAD to our master, checking out commit B; then, assuming the upstream is origin/master, run git fetch origin master to update our origin/master, which in this case leaves origin/master pointing to D. If we had not run git fetch yet this would obtain commits C and D and point our origin/master to D.

Last, this will run git rebase <hash-of-commit-D>. The rebase operation uses the hash ID because it uses the traces that git fetch leaves in .git/FETCH_HEAD and, depending on exact Git version and more details we'll ignore here, also uses git merge-base --fork-point to find a commit hash so as to recover from upstream rebases. (This process sometimes goes wrong, depending on your own work-flow, and I am not sure I like the default behavior.)

Once this is all done we get to the last two commands:

git checkout br
git rebase master

The first attaches HEAD to the name br, checking out commit G. The rebase then copies the E-F-G commit sequence to come after the commit to which master now points. So, ignoring all the reflog entries, we get the graph:

                E'-F'-G'  <-- br (HEAD)
               /
...--A--B--C--D   <-- master, origin/master
         \
          E--F--G   [abandoned]

Compare this with your shorter sequence of commands:

git checkout br
git pull --rebase origin master

The checkout attaches HEAD to br. The pull runs git fetch origin master, which makes sure we have commits C-D (if we did not already fetch them) and updates origin/master (if our Git is at least 1.8.4), then runs git rebase <hash-of-D> which copies the E-F-G chain, giving:

               E'-F'-G'  <-- br
              /
          C--D   <-- origin/master
         /
...--A--B   <-- master
         \
          E--F--G   [abandoned]

So the key difference is that your own name, master, never gets updated to point to commit D.

What I recommend instead

It's important to note (and know) that if you run git fetch yourself—this is my preferred method—this will tell your Git to call up the other Git at the remote's URL, and have the other Git list for your Git, all of its (origin's, we assume) branches. Your Git will then obtain all the commits they have that you don't and put them into your Git's repository, and update all of your remote-tracking names like origin/master and origin/develop and so on.

In other words, your remote-tracking names, which are your Git's way of remembering their branches, will all get updated. This is usually a good thing. It's only bad if they have a lot of branches and a lot of big commits and your network connection is slow; in that case, you might have to wait a long time to download everything.

When git pull runs git fetch, though, it runs it with a limiting option. For instance, if your git pull runs:

git fetch origin master

that tells your Git to call up the Git at the URL for origin and ask them to transfer only commits new to their master. If they have updates to their develop and production and feature/tall and so on, you don't get any of those—you only get new commits that are on their master. Your Git updates your origin/master to remember the new commits,² but leaves the rest of your remote-tracking names unchanged.

In your second command sequence, you run an explicit git pull origin master (with --rebase as well), so this limits your Git to updating your origin/master. In your first command sequence, you run git pull with no arguments—but git pull inserts origin and master, assuming those are the configured settings for your master branch, so this also limits your Git to updating only your origin/master.

I mention all this because I recommend not using git pull at all. Run git fetch yourself—you can let it default to fetching everything from origin—and then run whichever git rebase commands you want! After the fetch you have all the commits and all the appropriate origin/* names; you can then run:

git checkout <whatever-name>
git rebase origin/<whatever-other-name>

to copy whatever commits and/or adjust whichever of your own branch names you want to update. The one fetch lets you do any number of merge, reset, fast-forward pseudo-merge, or rebase operations. You can also look at what got fetched, before you decide what other Git commands to run!

²This assumes your Git is at least version 1.8.4. If not, this kind of git fetch fails to update even origin/master. You must run git fetch or git fetch origin to get your remote-tracking names updated!

Upvotes: 2

user7414776

Reputation:

git checkout branch_to_update
git rebase master

git checkout branch_to_update
git pull --rebase origin master

these are same result but different ways pull --rebase parameter briefly using rebase

Upvotes: 0

msanford

Reputation: 12247

Assuming there is only one remote repository, those two will have the same effect.

In the first case, you are updating a local copy of master, and then rebasing.

In the second case, you're rebasing directly from a remote repository.

Use the second option when you might not want to bother to update your local copy of the branch you're rebasing from.

For example, we have a main develop branch off which we make topic branches, e.g., feature/0001. While working, I keep feature/0001 checked out and simply git pull -r origin develop from time to time. In this case, having a local up-to-date copy of develop is irrelevant.

After my feature branch is merged, I checkout and pull develop, and then create a new feature/0002 branch from that updated copy.

Additionally, note that it will actually create this as a result:

M1 -> M2 -> M3 -> M4 -> L1' -> L2'

What do I mean by L1'? Roughly, it will create a new commit -- with a new SHA identifier -- with the same content. So it's not the same commit per se.

Upvotes: 3