Reputation: 9432
I am wondering why the following leaves staged changes:
git reset --hard master~4 # reset in prupose of the next command
# fetch from this repository... src: origin/master to destination: master
git fetch --update-head-ok . origin/master:master
git status # -> Shows various staged files?
The branch master
seems to be in sync with origin/master
.
But: now I have various staged files on master
?.
Why is this behavior like that? I thought that git fetch . origin/master:master
updates my local branch HEAD to the one in origin/master
. Obviously it does more? But what exactly?.
Upvotes: 1
Views: 1234
Reputation: 488193
To properly understand why this leaves you with files "staged for commit", you need to understand, and hold in your head, all of the following ten things about Git:
It's the commits that matter.
All commits—in fact, all internal Git objects of any kind—are strictly read-only.
Branch names, and other names, merely help you (and Git) find the commits.
The way this works is that every commit has a unique number: a big, ugly, and random-looking hash ID that lets Git look up the commit object in a big database (a key-value store) of all Git objects, including the commit objects and the other supporting objects. A name—branch name, remote-tracking name, tag name, or any other name—holds one hash ID.
Commits themselves find earlier commits. Each commit holds some number of previous-commit hash IDs. Most commits just have one hash ID; we call that the parent of the commit. This is, for instance, how git log
works: we find the last commit using the branch name. The branch name's hash ID causes the name to "point to" the commit. The commit's hash ID, of its parent, causes the commit to point backwards to its parent. Its parent also has a hash ID, which points back another step, and so on.
The thing that controls which branch name is the current branch name is the special name HEAD
. This is normally "attached to" a branch name. If you run git log
with no branch names or other starting points, Git uses HEAD
to find your current branch, and then uses the branch name to find the last commit.
The current branch name therefore determines the current commit.
Each commit holds a snapshot of every file. Because this is made up of internal Git objects (which are read-only, and in a format that can't be read by other programs), Git has to extract those files into a work area before you can use them or change them. This work area is called your working tree or work-tree. So there are, in effect, two copies of every file: the committed copy (read-only and Git-only) in the current commit, and the usable copy (read/write and an ordinary usable file).
Git does not make new commits from existing commits, nor from what's in your working tree. It has, instead, a third copy of every file. This copy is in the internal Git format, which is pre-de-duplicated, so if you haven't actually modified anything and git add
-ed it, this third "copy" really just shares the committed copy. (The commits themselves share these de-duplicated "copies" as well, which is quite safe since they're all strictly read-only.)
What git fetch
does.
With all of the above in mind, let's look at what git fetch
does now (and see why you need the --update-head-ok
flag as well). It may also help, especially if you're a visual learner, to draw a few graphs of how Git commits work, so we'll start with that.
We begin with the idea that we have some series of commits, each of which has its own big ugly hash ID. We don't want to deal with real hash IDs, so we'll use one uppercase letter instead, to stand in for hash IDs. The last commit in this chain has some hash ID that we'll call H
. We find this name using a branch name, to which the special name HEAD
is attached:
<-H <--branch (HEAD)
We indicate that the name branch
points to commit H
by drawing an arrow coming out of the branch name. But commit H
itself points to some earlier commit, so let's add it:
<-G <-H <--branch (HEAD)
Of course, commit G
points to an even-earlier commit:
... <-F <-G <-H <--branch (HEAD)
Now, the "arrows" coming out of commits (the hash IDs stored inside the commits) are as read-only, and as permanent, as everything else in the commit. Since we can't change them, and we know they point backwards, I'm going to draw them as connecting lines—partly out of laziness and partly because I don't have good arrow drawing in text, and I'm about to draw more than one branch name:
I--J <-- br1
/
...--G--H <-- main
\
K--L <-- br2
We get this situation when we had a main branch with commits ending at commit H
. We then created a new branch name that also pointed to commit H
:
...--G--H <-- main, br1 (HEAD)
The current commit is still commit H
, and we move HEAD
to the new name br1
. Then we make a new commit, which we'll call I
; I
will point back to H
, because we made new commit I
with commit H
being the current commit at the time. Git therefore writes I
's hash ID into the name br1
, to which HEAD
is attached:
I <-- br1 (HEAD)
/
...--G--H <-- main
We then go on to make a new commit J
. Then we use git switch
or git checkout
to attach HEAD
to main
again. Git will:
HEAD
to main
,H
to both your working tree and this third-copy-of-every-file that I mentioned.This gives us:
I--J <-- br1
/
...--G--H <-- main (HEAD)
From here, we create another branch name like br2
, attach HEAD
to it (staying on commit H
this time), and make new commits, to get to our final setup.
Note how the third-copy-of-every-file will match whatever commit we have checked out. That's because Git carefully co-ordinates it, as we move our current commit around. The checkout or switch command does this coordination internally.
This third-copy-of-every-file has a name. Actually, it has three names, reflecting how it's used, or how poorly chosen the first name was, or something. 😀 These three names are the index, the staging area, and the cache. The last name is mostly seen, these days, in flags to some Git commands: git rm --cached
or git diff --cached
, for instance. Some of these commands allow --staged
(but git rm
, at least, does not, at least not as of Git 2.29).
I like to stick with the meaningless, and original, term, index, because of the multiple ways it gets used. Still, except for its expanded role during merge conflict resolution, a good way to think of the index / staging-area is that it acts as your proposed next commit. By using git checkout
or git switch
, you arrange for Git to update its own index whenever you change branch names:
I--J <-- br1
/
...--G--H <-- main
\
K--L <-- br2 (HEAD)
Here, we're on commit L
, so the index presumably matches commit L
except for whatever you've updated via git add
. If all three copies of everything match—if the index's copy of each file matches the current commit's copy, and the work-tree's copy of each file matches the other two copies—we can switch from commit to commit, using git switch
or git checkout
. Git can safely clobber the entire index and work-tree contents, because they're safely stored in the commits, which are completely and totally read-only, and permanent—well, mostly permanent. They're hard to get rid of, but if you really work at it, you can sometimes get rid of some. (We won't worry about that here, and will just think of them as read-only and permanent.)
You have used the name origin/master
in your question. This is a remote-tracking name: it is your Git's memory of some other Git's master
branch. The other Git here is the one you talk to using the name origin
:
git fetch origin
for instance. The short name origin
holds a URL, and using that URL, your Git calls up some other Git. That other Git has its own branch names, which need not have anything to do with your branch names. These branch names find commits in their repository.
If you have those same commits in your repository—and you often will—you can have your own Git set up some name(s) to remember those commits in your repository. You don't want to use a branch name because your branch names are yours, and it would be bad to just arbitrarily move some of your own branch names around. Your branch names are there to help you find your desired commits, not someone else's.
So, your Git takes their names—their master
, for instance—and changes them. The end result is this name that gets abbreviated as origin/master
.1 We can draw them in:
...E--F--G--H <-- master (HEAD), origin/master
The special feature of a branch name is that if you use git checkout
or git switch
, you can get "on the branch". That's how you get the name HEAD
attached to the name master
.
The special feature of a remote-tracking name is that it gets updated by some kinds of git fetch
. But Git won't let you get "on" a remote-tracking name. If you run git checkout origin/master
, Git puts you in what it calls detached HEAD mode. With the new git switch
, Git demands that you acknowledge this mode first: you have to run git switch --detach origin/master
to get into detached-HEAD mode. I'll leave detached-HEAD mode out of this answer, but ultimately it's pretty simple: we just have the special name HEAD
point directly to a commit, rather than attaching it to a branch name. The problem with this is that once we make any new commits, anything we do that moves HEAD
—including attaching it to a branch name to get out of the mode—makes it really hard to find the hash IDs of the new commits we made.
1All of Git's names tend to get abbreviated. Your master
is actually short for refs/heads/master
; your origin/master
is short for refs/remotes/origin/master
. The various names right underneath the top level refs/
provide name spaces that make sure your own branch names never collide with any remote-tracking name, for instance.
git fetch
Suppose you and a friend or co-worker are working on some big project. There's some centralized copy of some Git repository, perhaps stored on GitHub or some other repository-hosting site (maybe a corporate or university host instead of GitHub). Whatever the case, you and your friend both wish to work with this repository.
What Git makes you do is make a clone of the centralized repository. You run:
git clone <url>
and you get your own copy of the repository. This copies all of its commits to your own repository, but—at first—none of its branches. The way it does this is to use git fetch
. The git clone
command is really just a convenience wrapper that runs up to six commands for you, with all but the first one being Git commands:
mkdir
(or your OS's equivalent): git clone
will (normally) make a new, empty directory in which to hold the clone. The remaining commands get run inside this currently-empty folder, though you'll have to navigate to it afterward.git init
: this makes a new, totally-empty repository. An empty repository has no commits and no branches. A branch name has to hold the hash ID of an existing commit, and there are no commits, so there cannot be any branch names.git remote add
: this sets up a remote, normally named origin
, saving the URL you used.git config
, if and as needed based on command line options you gave to git clone
.git fetch origin
(or whatever other name you chose by command-line options): this obtains commits from the other repository, and then creates or updates your remote-tracking names.git checkout
(or in Git 2.23 or later, git switch
): this creates a new branch name for you, and attaches HEAD
to that branch name.The branch created in step 6 is the one you chose with your -b
option to git clone
. If you did not choose one with -b
, your Git asks their Git which branch name they recommend, and uses that one. (There are some emergency fallbacks for the special case of cloning a totally-empty repository, since now you can't have a branch name, and they can't recommend one either, but we'll ignore these corner cases here.)
Let's say the repository you clone has eight commits, which we'll call A
through H
as before, and one branch name, master
. They therefore recommend that your Git create master
. Your Git creates your master
pointing to the same commit that their Git had with their name master
, that your Git is now calling origin/master
. So the end result is this:
...--E--F--G--H <-- master (HEAD), origin/master
git fetch
, and the underlying mechanismLet's review what git fetch
—step 5 of git clone
—did:
origin/master
.That, in general, is what git fetch
is meant for: obtain new commits that they have that I don't, that I want, and, having done that, create or update some names.
The mechanism for this is that you run git fetch
and give it the name of a remote: it needs this to know what the rules are for the remote-tracking names. So you run git fetch origin
to make this happen (or just git fetch
, which ends up inferring origin
, though the process for this inference is mildly complicated). This gets us into refspecs.
The actual syntax for git fetch
, as described in the SYNOPSIS section of its documentation, is:
git fetch [<options>] [<repository> [<refspec>...]]
(technically this is just the first of four ways to run git fetch
: it's a very complex command). Here, we used no options, but specified one repository
(origin
) and used no refspec
arguments. This makes Git look up the default refspec from the remote name. A remote doesn't just remember a URL, it also remembers one or more refspecs. The default refspec for origin
is stored under the name remote.origin.fetch
:
$ git config --get-all remote.origin.fetch
+refs/heads/*:refs/remotes/origin/*
(In this case, there is only one output line, so git config --get-all
does the same thing that git config --get
would do, but when using single-branch clones you can use git remote
to make them two- or three- or whatever-number-branch clones, and then the --get-all
gets more than one line.)
This thing—this +refs/heads/*:refs/remotes/origin/*
—is what Git calls a refspec. Refspecs are defined very briefly in the gitglossary with more details in the fetch and push documentation, but the short way to describe them is that they have two parts separated by a colon :
, and optionally prefixed with a plus sign +
. The +
prefix means force (the same as --force
as a command line option, but applied only to refs being updated due to this one particular refspec).
The parts that go on either side of the colon are refs, which can be abbreviated in the usual ways. So we can use a branch name like master
and run:
git push origin master:master
(Note here that I've jumped to the git push
command. It is like git fetch
in that it takes these repository
and refspec
arguments, but its use of refspecs is slightly different.)
Our default fetch refspec for origin
is:
+refs/heads/*:refs/remotes/origin/*
The plus sign turns on the forcing option, so that our Git will update our origin/*
names no matter what. The refs/heads/*
on the left means match all of their branch names. The refs/remotes/origin/*
on the right side is why git fetch
creates or updates our origin/master
, rather than our master
.
By using a refspec, you can change which names git fetch
creates-or-updates. You must be at least a little bit careful when doing so. When we have git fetch
update remote-tracking names, we're just updating our Git's memory of some other Git's branch names. If our Git's memory gets confused somehow (if we mess up the refspec somehow), well, we can just run git fetch
again: presumably their Git hasn't screwed up their branch names, so we just refresh our memory correctly and everything is fixed. But if we have git fetch
write on our memory of our own branch names, this could be bad: our branch names are how we find our commits!
Since git fetch
can write any ref, it can write branch names, or tag names, or remote-tracking names, or special-purpose names like the ones used for git bisect
or git stash
. That's a lot of power, so use it with care: if you run git fetch origin
you'll have a lot of safety mechanisms in place, but if you run git fetch origin refspec
you bypass them all, whether you want to or not.
Well, all but one. Before we get to that, let's look at HEAD
again, and then look at git reset
.
HEAD
and git reset
As we saw before, HEAD
tells us our current branch name. Since git fetch
can write to any ref—including a branch name—it can, if we tell it to, create or update any branch name. That includes the one HEAD
is attached-to. But the current branch name determines the current commit:
...--E--F--G--H <-- master (HEAD), origin/master
This tells us that commit H
is the current commit.
Sometimes we might want to move our current branch to point to some other existing commit. For instance, suppose we make a new commit I
:
I <-- master (HEAD)
/
...--E--F--G--H <-- origin/master
Then we immediately decide that commit I
is total rubbish and want to get rid of it. To do that, we can use git reset
.
The reset command is insanely complicated.2 We'll ignore a lot of it and just concentrate on the variants that move the current branch name. We run:
git reset --hard <hash-ID-or-other-commit-specifier>
and Git:
It's basically as if we had checked out some other commit, but in the process, dragged the branch name with us. So we can use:
git reset --hard origin/master
or:
git reset --hard HEAD~1
or any other way of naming commit H
(perhaps using its actual hash ID, from git log
output). The end result of this is:
I ???
/
...--E--F--G--H <-- master (HEAD), origin/master
Commit I
still exists, but now it's very hard to find. There is no name for it any more.
Note how this git reset
swapped out the contents of Git's index and our work-tree. This way, everything is in sync: the current commit is H
again, the staging area matches commit H
, and our work-tree matches commit H
. We could use other kinds of git reset
commands and if we did, things would be different. We'll come back to this in a bit.
2In fact, it's so complicated that I think that, like the old git checkout
, it should be split into two commands: git checkout
became git switch
and git restore
. It's not clear to me what two names to use for a split-up git reset
, except that one of them is probably git restore
. 😀
git reset
is similarYou ran:
git reset --hard master~4
Let's assume that your current branch was also master
(you didn't say, but it's clearly implied by the rest of your question). Let's also assume that your master
was originally in sync with your own origin/master
, so that you started with:
...--D--E--F--G--H <-- master (HEAD), origin/master
Your git reset
did this:
...--D <-- master (HEAD)
\
E--F--G--H <-- origin/master
No commit has changed (no commit can change, ever) but you're now working with commit D
. Your index / staging-area and work-tree match commit D
. Commit D
is the current commit.
git fetch
is quite unusualNext, you ran:
git fetch --update-head-ok . origin/master:master
Here, you used .
instead of the name of a remote. That's OK, because git fetch
allows more than just a remote name here. You could use a URL, or a path name; .
counts as a path name and means this repository. Your Git, in essence, calls itself up, and asks itself which commits it has, and what its branch names are.
Your Git has no new commits in it that your Git needs from the "other" Git (your Git has exactly those commits that it has, of course) so the obtain new commits step does nothing. Then, the refspec origin/master:master
applies: you have "them" look up "their" origin/master
—that's your own origin/master
, which identifies commit H
—and copy that to your branch name master
.
This is where that last special safety-check comes in. Normally, git fetch
will refuse to update the current branch name. That's because the current branch name determines the current commit. But the --update-head-ok
flag turns off the safety check, so your git fetch
goes ahead and updates the current branch name. Your name master
now points to commit H
.
What didn't happen is that Git did not update its index or your work-tree. These two were left alone. They still match commit D
. So while you now have:
...--D
\
E--F--G--H <-- master (HEAD), origin/master
your index and work-tree match commit D
.
git reset --soft
Had you run:
git reset --soft origin/master
your Git would have moved your current branch name, master
, to point to commit H
. The --soft
, however, tells git reset
:
so you'd be left in the same situation as before.
There is one slight difference between this git reset
and your git fetch
, but it has no effect at all in this particular case. Specifically, when git fetch
is updating a ref, it can enforce fast-forward rules. These rules apply to branch names and remote-tracking names. (Versions of Git predating 1.8.2 accidentally applied them to tag names too.) The fast-forward rule requires that the new hash ID stored in some name be a descendant commit of the hash ID stored in the name before the update.
The git reset
command never enforces a fast-forward rule. The git fetch
and git push
commands do, unless the update is forced (with --force
or a leading +
character in the refspec).
Upvotes: 1
Reputation: 1324318
The --update-head-ok
man page mentions:
By default
git fetch
refuses to update the head which corresponds to the current branch.This flag disables the check.
This is purely for the internal use for git pull to communicate with git fetch, and unless you are implementing your own Porcelain you are not supposed to use it.
So:
master~4
master
to origin/master
(which is not master~4
, but some other commit)Git shows you what is in the index, but not in HEAD: those are the files already staged (because of the first reset), and not in HEAD (which refers to origin/master
)
If your goal was to reset master to origin/master, do:
git fetch
git switch -C master origin/master
Upvotes: 1