Gabriel
Gabriel

Reputation: 9432

Why does `git fetch . origin/master:master` leave staged changes?

I am wondering why the following leaves staged changes:

git reset --hard master~4 # reset in prupose of the next command 
# fetch from this repository... src: origin/master to destination: master
git fetch --update-head-ok . origin/master:master 
git status # -> Shows various staged files?

The branch master seems to be in sync with origin/master. But: now I have various staged files on master ?. Why is this behavior like that? I thought that git fetch . origin/master:master updates my local branch HEAD to the one in origin/master. Obviously it does more? But what exactly?.

Upvotes: 1

Views: 1234

Answers (2)

torek
torek

Reputation: 488193

To properly understand why this leaves you with files "staged for commit", you need to understand, and hold in your head, all of the following ten things about Git:

  1. It's the commits that matter.

  2. All commits—in fact, all internal Git objects of any kind—are strictly read-only.

  3. Branch names, and other names, merely help you (and Git) find the commits.

  4. The way this works is that every commit has a unique number: a big, ugly, and random-looking hash ID that lets Git look up the commit object in a big database (a key-value store) of all Git objects, including the commit objects and the other supporting objects. A name—branch name, remote-tracking name, tag name, or any other name—holds one hash ID.

  5. Commits themselves find earlier commits. Each commit holds some number of previous-commit hash IDs. Most commits just have one hash ID; we call that the parent of the commit. This is, for instance, how git log works: we find the last commit using the branch name. The branch name's hash ID causes the name to "point to" the commit. The commit's hash ID, of its parent, causes the commit to point backwards to its parent. Its parent also has a hash ID, which points back another step, and so on.

  6. The thing that controls which branch name is the current branch name is the special name HEAD. This is normally "attached to" a branch name. If you run git log with no branch names or other starting points, Git uses HEAD to find your current branch, and then uses the branch name to find the last commit.

  7. The current branch name therefore determines the current commit.

  8. Each commit holds a snapshot of every file. Because this is made up of internal Git objects (which are read-only, and in a format that can't be read by other programs), Git has to extract those files into a work area before you can use them or change them. This work area is called your working tree or work-tree. So there are, in effect, two copies of every file: the committed copy (read-only and Git-only) in the current commit, and the usable copy (read/write and an ordinary usable file).

  9. Git does not make new commits from existing commits, nor from what's in your working tree. It has, instead, a third copy of every file. This copy is in the internal Git format, which is pre-de-duplicated, so if you haven't actually modified anything and git add-ed it, this third "copy" really just shares the committed copy. (The commits themselves share these de-duplicated "copies" as well, which is quite safe since they're all strictly read-only.)

  10. What git fetch does.

With all of the above in mind, let's look at what git fetch does now (and see why you need the --update-head-ok flag as well). It may also help, especially if you're a visual learner, to draw a few graphs of how Git commits work, so we'll start with that.

Chains of commits

We begin with the idea that we have some series of commits, each of which has its own big ugly hash ID. We don't want to deal with real hash IDs, so we'll use one uppercase letter instead, to stand in for hash IDs. The last commit in this chain has some hash ID that we'll call H. We find this name using a branch name, to which the special name HEAD is attached:

            <-H   <--branch (HEAD)

We indicate that the name branch points to commit H by drawing an arrow coming out of the branch name. But commit H itself points to some earlier commit, so let's add it:

        <-G <-H   <--branch (HEAD)

Of course, commit G points to an even-earlier commit:

... <-F <-G <-H   <--branch (HEAD)

Now, the "arrows" coming out of commits (the hash IDs stored inside the commits) are as read-only, and as permanent, as everything else in the commit. Since we can't change them, and we know they point backwards, I'm going to draw them as connecting lines—partly out of laziness and partly because I don't have good arrow drawing in text, and I'm about to draw more than one branch name:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2

We get this situation when we had a main branch with commits ending at commit H. We then created a new branch name that also pointed to commit H:

...--G--H   <-- main, br1 (HEAD)

The current commit is still commit H, and we move HEAD to the new name br1. Then we make a new commit, which we'll call I; I will point back to H, because we made new commit I with commit H being the current commit at the time. Git therefore writes I's hash ID into the name br1, to which HEAD is attached:

          I   <-- br1 (HEAD)
         /
...--G--H   <-- main

We then go on to make a new commit J. Then we use git switch or git checkout to attach HEAD to main again. Git will:

  • attach HEAD to main,
  • extract commit H to both your working tree and this third-copy-of-every-file that I mentioned.

This gives us:

          I--J   <-- br1
         /
...--G--H   <-- main (HEAD)

From here, we create another branch name like br2, attach HEAD to it (staying on commit H this time), and make new commits, to get to our final setup.

The index / staging-area / cache

Note how the third-copy-of-every-file will match whatever commit we have checked out. That's because Git carefully co-ordinates it, as we move our current commit around. The checkout or switch command does this coordination internally.

This third-copy-of-every-file has a name. Actually, it has three names, reflecting how it's used, or how poorly chosen the first name was, or something. 😀 These three names are the index, the staging area, and the cache. The last name is mostly seen, these days, in flags to some Git commands: git rm --cached or git diff --cached, for instance. Some of these commands allow --staged (but git rm, at least, does not, at least not as of Git 2.29).

I like to stick with the meaningless, and original, term, index, because of the multiple ways it gets used. Still, except for its expanded role during merge conflict resolution, a good way to think of the index / staging-area is that it acts as your proposed next commit. By using git checkout or git switch, you arrange for Git to update its own index whenever you change branch names:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2 (HEAD)

Here, we're on commit L, so the index presumably matches commit L except for whatever you've updated via git add. If all three copies of everything match—if the index's copy of each file matches the current commit's copy, and the work-tree's copy of each file matches the other two copies—we can switch from commit to commit, using git switch or git checkout. Git can safely clobber the entire index and work-tree contents, because they're safely stored in the commits, which are completely and totally read-only, and permanent—well, mostly permanent. They're hard to get rid of, but if you really work at it, you can sometimes get rid of some. (We won't worry about that here, and will just think of them as read-only and permanent.)

Remote-tracking names are just as good as branch names for finding commits

You have used the name origin/master in your question. This is a remote-tracking name: it is your Git's memory of some other Git's master branch. The other Git here is the one you talk to using the name origin:

git fetch origin

for instance. The short name origin holds a URL, and using that URL, your Git calls up some other Git. That other Git has its own branch names, which need not have anything to do with your branch names. These branch names find commits in their repository.

If you have those same commits in your repository—and you often will—you can have your own Git set up some name(s) to remember those commits in your repository. You don't want to use a branch name because your branch names are yours, and it would be bad to just arbitrarily move some of your own branch names around. Your branch names are there to help you find your desired commits, not someone else's.

So, your Git takes their names—their master, for instance—and changes them. The end result is this name that gets abbreviated as origin/master.1 We can draw them in:

...E--F--G--H   <-- master (HEAD), origin/master

The special feature of a branch name is that if you use git checkout or git switch, you can get "on the branch". That's how you get the name HEAD attached to the name master.

The special feature of a remote-tracking name is that it gets updated by some kinds of git fetch. But Git won't let you get "on" a remote-tracking name. If you run git checkout origin/master, Git puts you in what it calls detached HEAD mode. With the new git switch, Git demands that you acknowledge this mode first: you have to run git switch --detach origin/master to get into detached-HEAD mode. I'll leave detached-HEAD mode out of this answer, but ultimately it's pretty simple: we just have the special name HEAD point directly to a commit, rather than attaching it to a branch name. The problem with this is that once we make any new commits, anything we do that moves HEAD—including attaching it to a branch name to get out of the mode—makes it really hard to find the hash IDs of the new commits we made.


1All of Git's names tend to get abbreviated. Your master is actually short for refs/heads/master; your origin/master is short for refs/remotes/origin/master. The various names right underneath the top level refs/ provide name spaces that make sure your own branch names never collide with any remote-tracking name, for instance.


The normal way remote-tracking names help, via git fetch

Suppose you and a friend or co-worker are working on some big project. There's some centralized copy of some Git repository, perhaps stored on GitHub or some other repository-hosting site (maybe a corporate or university host instead of GitHub). Whatever the case, you and your friend both wish to work with this repository.

What Git makes you do is make a clone of the centralized repository. You run:

git clone <url>

and you get your own copy of the repository. This copies all of its commits to your own repository, but—at first—none of its branches. The way it does this is to use git fetch. The git clone command is really just a convenience wrapper that runs up to six commands for you, with all but the first one being Git commands:

  1. mkdir (or your OS's equivalent): git clone will (normally) make a new, empty directory in which to hold the clone. The remaining commands get run inside this currently-empty folder, though you'll have to navigate to it afterward.
  2. git init: this makes a new, totally-empty repository. An empty repository has no commits and no branches. A branch name has to hold the hash ID of an existing commit, and there are no commits, so there cannot be any branch names.
  3. git remote add: this sets up a remote, normally named origin, saving the URL you used.
  4. git config, if and as needed based on command line options you gave to git clone.
  5. git fetch origin (or whatever other name you chose by command-line options): this obtains commits from the other repository, and then creates or updates your remote-tracking names.
  6. git checkout (or in Git 2.23 or later, git switch): this creates a new branch name for you, and attaches HEAD to that branch name.

The branch created in step 6 is the one you chose with your -b option to git clone. If you did not choose one with -b, your Git asks their Git which branch name they recommend, and uses that one. (There are some emergency fallbacks for the special case of cloning a totally-empty repository, since now you can't have a branch name, and they can't recommend one either, but we'll ignore these corner cases here.)

Let's say the repository you clone has eight commits, which we'll call A through H as before, and one branch name, master. They therefore recommend that your Git create master. Your Git creates your master pointing to the same commit that their Git had with their name master, that your Git is now calling origin/master. So the end result is this:

...--E--F--G--H   <-- master (HEAD), origin/master

A normal git fetch, and the underlying mechanism

Let's review what git fetch—step 5 of git clone—did:

  • It got, from their Git, any commits they had, that you didn't, that you would need;
  • It created (because it didn't exist yet) your origin/master.

That, in general, is what git fetch is meant for: obtain new commits that they have that I don't, that I want, and, having done that, create or update some names.

The mechanism for this is that you run git fetch and give it the name of a remote: it needs this to know what the rules are for the remote-tracking names. So you run git fetch origin to make this happen (or just git fetch, which ends up inferring origin, though the process for this inference is mildly complicated). This gets us into refspecs.

The actual syntax for git fetch, as described in the SYNOPSIS section of its documentation, is:

git fetch [<options>] [<repository> [<refspec>...]]

(technically this is just the first of four ways to run git fetch: it's a very complex command). Here, we used no options, but specified one repository (origin) and used no refspec arguments. This makes Git look up the default refspec from the remote name. A remote doesn't just remember a URL, it also remembers one or more refspecs. The default refspec for origin is stored under the name remote.origin.fetch:

$ git config --get-all remote.origin.fetch
+refs/heads/*:refs/remotes/origin/*

(In this case, there is only one output line, so git config --get-all does the same thing that git config --get would do, but when using single-branch clones you can use git remote to make them two- or three- or whatever-number-branch clones, and then the --get-all gets more than one line.)

refspecs and refs

This thing—this +refs/heads/*:refs/remotes/origin/*—is what Git calls a refspec. Refspecs are defined very briefly in the gitglossary with more details in the fetch and push documentation, but the short way to describe them is that they have two parts separated by a colon :, and optionally prefixed with a plus sign +. The + prefix means force (the same as --force as a command line option, but applied only to refs being updated due to this one particular refspec).

The parts that go on either side of the colon are refs, which can be abbreviated in the usual ways. So we can use a branch name like master and run:

git push origin master:master

(Note here that I've jumped to the git push command. It is like git fetch in that it takes these repository and refspec arguments, but its use of refspecs is slightly different.)

Our default fetch refspec for origin is:

+refs/heads/*:refs/remotes/origin/*

The plus sign turns on the forcing option, so that our Git will update our origin/* names no matter what. The refs/heads/* on the left means match all of their branch names. The refs/remotes/origin/* on the right side is why git fetch creates or updates our origin/master, rather than our master.

By using a refspec, you can change which names git fetch creates-or-updates. You must be at least a little bit careful when doing so. When we have git fetch update remote-tracking names, we're just updating our Git's memory of some other Git's branch names. If our Git's memory gets confused somehow (if we mess up the refspec somehow), well, we can just run git fetch again: presumably their Git hasn't screwed up their branch names, so we just refresh our memory correctly and everything is fixed. But if we have git fetch write on our memory of our own branch names, this could be bad: our branch names are how we find our commits!

Since git fetch can write any ref, it can write branch names, or tag names, or remote-tracking names, or special-purpose names like the ones used for git bisect or git stash. That's a lot of power, so use it with care: if you run git fetch origin you'll have a lot of safety mechanisms in place, but if you run git fetch origin refspec you bypass them all, whether you want to or not.

Well, all but one. Before we get to that, let's look at HEAD again, and then look at git reset.

HEAD and git reset

As we saw before, HEAD tells us our current branch name. Since git fetch can write to any ref—including a branch name—it can, if we tell it to, create or update any branch name. That includes the one HEAD is attached-to. But the current branch name determines the current commit:

...--E--F--G--H   <-- master (HEAD), origin/master

This tells us that commit H is the current commit.

Sometimes we might want to move our current branch to point to some other existing commit. For instance, suppose we make a new commit I:

                I   <-- master (HEAD)
               /
...--E--F--G--H   <-- origin/master

Then we immediately decide that commit I is total rubbish and want to get rid of it. To do that, we can use git reset.

The reset command is insanely complicated.2 We'll ignore a lot of it and just concentrate on the variants that move the current branch name. We run:

git reset --hard <hash-ID-or-other-commit-specifier>

and Git:

  • makes the current branch name point to the chosen commit;
  • makes the index / staging-area match the chosen commit; and
  • makes our work-tree match the chosen commit.

It's basically as if we had checked out some other commit, but in the process, dragged the branch name with us. So we can use:

git reset --hard origin/master

or:

git reset --hard HEAD~1

or any other way of naming commit H (perhaps using its actual hash ID, from git log output). The end result of this is:

                I   ???
               /
...--E--F--G--H   <-- master (HEAD), origin/master

Commit I still exists, but now it's very hard to find. There is no name for it any more.

Note how this git reset swapped out the contents of Git's index and our work-tree. This way, everything is in sync: the current commit is H again, the staging area matches commit H, and our work-tree matches commit H. We could use other kinds of git reset commands and if we did, things would be different. We'll come back to this in a bit.


2In fact, it's so complicated that I think that, like the old git checkout, it should be split into two commands: git checkout became git switch and git restore. It's not clear to me what two names to use for a split-up git reset, except that one of them is probably git restore. 😀


Your particular git reset is similar

You ran:

git reset --hard master~4

Let's assume that your current branch was also master (you didn't say, but it's clearly implied by the rest of your question). Let's also assume that your master was originally in sync with your own origin/master, so that you started with:

...--D--E--F--G--H   <-- master (HEAD), origin/master

Your git reset did this:

...--D   <-- master (HEAD)
      \
       E--F--G--H   <-- origin/master

No commit has changed (no commit can change, ever) but you're now working with commit D. Your index / staging-area and work-tree match commit D. Commit D is the current commit.

Your git fetch is quite unusual

Next, you ran:

git fetch --update-head-ok . origin/master:master 

Here, you used . instead of the name of a remote. That's OK, because git fetch allows more than just a remote name here. You could use a URL, or a path name; . counts as a path name and means this repository. Your Git, in essence, calls itself up, and asks itself which commits it has, and what its branch names are.

Your Git has no new commits in it that your Git needs from the "other" Git (your Git has exactly those commits that it has, of course) so the obtain new commits step does nothing. Then, the refspec origin/master:master applies: you have "them" look up "their" origin/master—that's your own origin/master, which identifies commit H—and copy that to your branch name master.

This is where that last special safety-check comes in. Normally, git fetch will refuse to update the current branch name. That's because the current branch name determines the current commit. But the --update-head-ok flag turns off the safety check, so your git fetch goes ahead and updates the current branch name. Your name master now points to commit H.

What didn't happen is that Git did not update its index or your work-tree. These two were left alone. They still match commit D. So while you now have:

...--D
      \
       E--F--G--H   <-- master (HEAD), origin/master

your index and work-tree match commit D.

You can get this same effect with git reset --soft

Had you run:

git reset --soft origin/master

your Git would have moved your current branch name, master, to point to commit H. The --soft, however, tells git reset:

  • do not update your index, and
  • do not update my work-tree

so you'd be left in the same situation as before.

There is one slight difference between this git reset and your git fetch, but it has no effect at all in this particular case. Specifically, when git fetch is updating a ref, it can enforce fast-forward rules. These rules apply to branch names and remote-tracking names. (Versions of Git predating 1.8.2 accidentally applied them to tag names too.) The fast-forward rule requires that the new hash ID stored in some name be a descendant commit of the hash ID stored in the name before the update.

The git reset command never enforces a fast-forward rule. The git fetch and git push commands do, unless the update is forced (with --force or a leading + character in the refspec).

Upvotes: 1

VonC
VonC

Reputation: 1324318

The --update-head-ok man page mentions:

By default git fetch refuses to update the head which corresponds to the current branch.

This flag disables the check.
This is purely for the internal use for git pull to communicate with git fetch, and unless you are implementing your own Porcelain you are not supposed to use it.

So:

  • you have reset the index to master~4
  • Then, you have reset master to origin/master (which is not master~4, but some other commit)

Git shows you what is in the index, but not in HEAD: those are the files already staged (because of the first reset), and not in HEAD (which refers to origin/master)

If your goal was to reset master to origin/master, do:

git fetch
git switch -C master origin/master

Upvotes: 1

Related Questions