Reputation: 594
Looking up the difference between git pull
and git fetch
, many sources say that git pull
is a superset of fetch, i.e. git pull
is fetch + merge.
However, I seem to remember many times where git pull
told me that everything was up to date, but fetch yielded new information.
Can someone explain this discrepancy between theory and reality?
Upvotes: 1
Views: 839
Reputation: 534903
However, I seem to remember many times where git pull told me that everything was up to date, but fetch yielded new information.
That's merely because of how Git reports what happened.
git fetch
updates all remote tracking branches and reports on that, so any new commits at the remote site will yield some sort of output.
But git pull
, though it also does a git fetch
, only reports what happened on the current local branch, which might well be nothing even if the fetch did bring lots of commits into the remote tracking branches. (Another good reason not to use pull
!)
Upvotes: 0
Reputation: 487755
Pull is indeed fetch plus merge.
Except when it's not.
When isn't it? When it's fetch plus rebase, or—very rarely—fetch plus checkout. But in all three cases, it's still:
git fetch
, followed byWhere this gets complicated is not so much in the second command—though that second command does complicate things—but rather in the arguments passed from git pull
. Since git pull
is running two other Git commands, and Git commands' actions depend on their options and arguments, it matters what options and arguments git pull
passes to git fetch
and to that second command, whatever it may be.
In the early days of Git, there were no "remotes" like origin
, which meant there were no "remote-tracking names" either. You would run:
git fetch git://name-of-linus-torvalds-machine/repos/foo.git
to get stuff from Linus and then run git merge FETCH_HEAD
, or something along these lines. This was error prone (easy to have a typo in the URL) and annoying, so Git acquired a bunch of temporary methods to deal with this.
Note that with no remotes, all git fetch
could do was leave a bunch of information in .git/FETCH_HEAD
so that you could figure out which branches in Linus's repos had been updated and so on. And of course, git pull
wrapped these two commands into one, so that you didn't have to run two separate commands, and most people used git pull
. But something was clearly missing. So remotes were invented:
origin
that we could use instead of a URL. (This got rid of the need for all the weird hacks for naming remotes that are still listed in the documentation, but they're all still in there. Look for Named file in $GIT_DIR
.)origin/master
and the like) take over a job that would in the past require using a local branch name.But all these things are still supported and some of them are still described as "the way to do things" in some (ancient) documents, so you can still use the old crude methods. Perhaps some do.
In any case, remote-tracking names now exist. However, between Git 1.7 and Git 2.0, there were some updates to them. Specifically, Git 1.8.4 fixed something eventually declared to be a bug. Some people are still using Git 1.7.x for some strange reason, so be aware that you could hit them.
In Git 2.11, the old git pull
shell script was formally retired. While git pull
still effectively runs git fetch
followed by a second Git command, you can no longer point to the shell script and say: "See, here at this line, it runs git fetch
. Then it has these tests and then it eventually runs this other command..." The result is that it runs much faster on Windows, and is much harder to explain. 😀 It's also gained a feature or two since then, enough that at least a few hardcore "anti pull" people like me are now willing to actually use the thing. But that's another story.
git pull
The git pull
command has a lot of options. See its documentation for the complete list, then compare these options to those for git fetch
and for git rebase
and git merge
. Note that the pull documentation says that some options are passed to one or the other or to both, and that there's a fair bit of overlap in some options (e.g., all take -q
for quiet
and -v
for verbose
).
With or without these options, though, you can run:
git pull
or:
git pull origin
or:
git pull origin main
for example. If and when you do run any of these, all of these positional arguments are passed to git fetch
.
Note that you can even run:
git pull origin main feature
but you almost certainly should not. We'll cover why this is later below.
Options, if you give them, are passed as described to one or both of the fetch and second-command steps.
The fetch
command is always passed one extra option, namely --update-head-ok
. Pull needs to pass this option, but also needs to be careful because careless use of this can get your current branch, index, and working tree out of sync. Do not use this option yourself unless you know exactly what you are doing.
For (at least, and maybe only) historical reasons, when passed some refspec arguments, such as main
in the git fetch origin main
case, git fetch
will only update the specified refspecs and associated remote-tracking names. Since git pull
passes all the refspec arguments you supplied on to git fetch
, but no extras of its own, git fetch
gets a refspec argument if and only if you passed refspec arguments to git pull
here.
(Fetch refspecs are slightly different from push refspecs: git push origin main
is equivalent to git push origin main:main
, but git fetch origin main
is equivalent to git fetch origin main:<discard>
with the side effect of also updating origin/main
. If you like, you can run git fetch origin main:main
, but this requires that you not be on that branch, except for the --update-head-ok
special case that git pull
arranges.)
The second command that git pull
runs is:
git merge
, by default, orgit rebase
, if you've told Git to do that, orgit checkout
, in the one special case.Again, git pull
passes options and arguments to the second command, and here things get messy. When git pull
runs git merge
, it passes:
-m
option with a precomputed merge message (unless you supply your own -m
); plusThat last one is a puzzle: what does "as selected" really mean? Well, let's go back to the git pull
syntax:
git pull
git pull origin
git pull origin main
We know that these words, if supplied (origin
and main
), are passed through to git fetch
. They specify the remote and, if there's a second word, the branch name as seen on that remote for the git fetch
operation.
If we don't supply a branch name as seen on the remote, git pull
requires that the current branch—the one we're on
, as in git status
will say on branch main
or whatever—have an upstream set. (See also Why do I need to do `--set-upstream` all the time?) An upstream is technically a pair: both a remote and a branch-name-as-seen-on-the-remote. These are normally presented to you in the more palatable remote-tracking name format, so that the upstream of your main
would typically be your origin/main
, i.e., main
as seen over on origin
.
Your git pull
command will fish the branch name out of the upstream, if needed. It does not pass this on to git fetch
, but it does use it later during this second git merge
command. At this point git pull
will use .git/FETCH_HEAD
—which git fetch
still writes, just like it did in primeval Git before Git 1.5 was released more widely—to fish out the commit hash ID associated with main
over on origin
. That's the hash ID that git pull
passes to git merge
.
In other words, if you're on your main
and its upstream is origin/main
and you run:
git pull
your Git will run:
git fetch --update-head-ok
followed by, if using git merge
:
git merge -m "merge branch 'main' of <url>" <hash-ID>
where the URL and hash-ID are those from origin
and from .git/FETCH_HEAD
.
If you, yourself, run:
git fetch
git merge
you'll get the same effect, except that you won't have a -m
option and the merge message will be the default, which will be merge branch 'origin/main'
. That is, the URL vanishes and the branch main of ...
part is phrased differently.
But if you run:
git pull origin main
your git pull
command will run:
git fetch --update-head-ok origin main
git merge -m <same message as before> <same hash ID as before>
That is, the extra origin main
get passed to git fetch
, which limits what gets fetched.
We can also now see why we should not run:
git pull origin main feature
This would run:
git fetch --update-head-ok origin main feature
(which itself is fine), but then it will run:
git merge -m <message> <hash#1> <hash#2>
That is, your git pull
will fish out, from .git/FETCH_HEAD
, two hash IDs: one corresponding to main
on origin
, and one corresponding to feature
on origin
. It then passes both hash IDs to one single git merge
command. This one git merge
command will do what Git calls an octopus merge.1
(Those new to Git often seem to expect that:
git pull origin br1 br2
should check out br1
locally, fetch-and-merge origin/br1
, then check out br2
locally, and fetch-and-merge origin/br2
, perhaps as a more efficient thing than this somewhat clumsy sequential description. That could make sense, and I believe I thought this myself at one point, but it's just not true.)
If you tell Git to use git rebase
instead of git merge
—which you can now do in several ways, such as setting pull.rebase
to true
, in addition to providing --rebase
as an option to git pull
—Git will replace the git merge
command with a git rebase
command. This changes the set of options that can be passed through:
-m
, so you cannot give one;--ff-only
or --no-ff
, so you cannot give these.The git rebase
command has a mode called autostash where, if your status is not "clean" (as in git status
would not say working tree clean, nothing to commit
), git rebase
will run git stash push
before it starts the rebase, and git stash pop
at the end. I am not a fan of git stash
in general and unless you're pretty good at dealing with conflicts, I recommend not using this feature.
If autostash is disabled (which is the default), the rebase will refuse to start if the status is not "clean". With git merge
as the second command, the merge will generally refuse to start in the same situation (although I recall ancient Git versions behaving differently, with the same messy side effects as for git stash pop
in some conflict cases).
The last case is one that's only seen rarely. You can have a Git repository in a special state, for which Git uses two different terms: an unborn branch or an orphan branch. This state exists in part because a new, totally-empty repository has no commits at all on it.
A branch name, in Git, must contain the hash ID of some valid, existing commit. But when you run git init
and it creates a new, totally-empty repository, there is no commit. With no commits, there can be no branches. And yet, git status
will say that you're on some branch, and that there are no commits yet and you should make the first one.
In this state—this orphan / unborn branch state—the next commit you make will be a root commit, which in a new empty repository is what you normally want: that's the first commit ever, and it starts history existing. Now you have a commit and you can build on it.
When you run git pull
while in this unborn-branch state, though, the git pull
operation may get a bunch of commits from the remote (from origin
for instance). The second command is supposed to combine those new commits that git pull
got, as directed by the remaining git pull
arguments, with the commits on the current branch. There are no commits on the current branch (which does not exist), but zero plus something is the something, right? So git pull
declares that the result of this pull-into-empty-repository is that you should check out the commit that's at the tip of the branch you git pull
-ed. That is:
git init
git remote add origin <url>
git pull origin main
should have your Git reach out to the given URL, find their main
, get commits from their Git, create your origin/main
, and then create your own main
that is an exact match for your origin/main
that your Git just created based on their main
.
The thing that does this last step is a branch-creating git checkout -b
or git switch -c
, so that's what git pull
will do here. (There was a bug, back in Git 1.5 or 1.6 or so, where if your working tree was non-empty, this git pull
command would wipe it out entirely. This bug bit me at least once and is at least some of the reason I learned to avoid git pull
. This bug has been long fixed, but I generally like to fetch, inspect, and merge-or-rebase, and I need to run git log
to do the inspecting, between the fetch and the second—or rather, third—command. So I still use git pull
only sparingly at best. But it now has pull.ff only
as a configuration item, and that covers my most common case, so I am slowly warming up to it.)
1For more on octopus merges, see the git merge
documentation. Note that if the two hash IDs are identical, the effect of this octopus merge is largely the same as that of a regular merge, except that octopus merges cannot handle conflicts. At least, not yet: Junio Hamano was musing a bit on whether the new merge-ort
might be able to tackle this.
It's not clear to me that this is a good idea. In fact, it's somewhat clear to me that having octopus merge be weaker, and not able to handle merge conflicts, is a good thing.
However, I seem to remember many times where git pull told me that everything was up to date, but fetch yielded new information.
If you run git pull origin main
and get the up-to-date message, your current branch has origin/main
merged in and there's nothing to do here. But if you then run git fetch origin
(or just git fetch
), you'll fetch all their branch names, updating all your remote-tracking names.
If the upstream of the current branch is origin/main
, you can run:
git pull
instead of:
git pull origin main
and the git fetch
that git pull
runs won't be limited to fetching only their main
.
Upvotes: 4