Reputation: 679

What does `git reset --hard origin` do?

If you want to reset your local branch to the remote branch, you can do

git fetch
git reset --hard origin/master

I've actually been using

git reset --hard origin

What does this do?

Upvotes: 2

Answers (1)

torek

Reputation: 487755

TL;DR

In some contexts, but not all contexts, origin means origin/HEAD. This is one of those contexts.

Long

While origin/master and origin are clearly different (one has /master on the end!), they're doing the same thing with your particular repository for this particular operation. They are not always interchangeable, though.

Background (long but important)

To understand this, let's look at how Git uses arguments. You might, for instance, run:

git commit -m "this is a bad commit message"

or:

git ls-remote origin

or:

git fetch github +refs/pull/123/head:refs/heads/pr123

The git program expects to be invoked via some sort of command line interpreter (CLI), which we call a shell for obscure reasons.¹ This CLI will break up arguments at white-space, so that fetch, github, and +refs/pull/123/head:refs/heads/pr123 are separate arguments to the last command; however, quotes can defeat this breaking-up, so that git commit -m "commit message" comes through with arguments commit, -m, and commit message. The quotes are gone by this point but the white-space (space character, in this case) remains.

Git then uses its own peculiar mix of POSIX and GNU conventions to decide which of these various arguments are flag arguments and which are positional arguments. A flag argument like -m or --hard does not "count" in terms of positions; positional arguments do get counted. Moreover, some flags are for the git command itself: for instance, with

git -C path/to/repo rev-parse --symbolic-full-name HEAD

the git command takes the -C and path/to/repo for itself, and then passes --symbolic-full-name and HEAD to the rev-parse verb. Meanwhile, some options—like -C to git itself, and -m to git commit—need an additional argument, so they "eat" the next word, but others, like --symbolic-full-name to git rev-parse do not need any additional argument and do not eat the next word.²

So at this point we have:

git <optional flags> <verb> <optional flags> <optional arguments>

where the verb is one of the various Git commands, such as branch or checkout or commit or fetch or ls-remote or reset or whatever. The front-end git command defines what flags it takes, but each verb defines what flags it takes. After consuming the various flags, the remaining arguments are positional arguments and are numbered.

So, for git fetch, the first numbered (positional) argument, origin or github in the examples above, specifies a remote. A remote is a short name³ under which Git can store various pieces of information, with the main required piece being the URL at which your Git can reach some other Git software.⁴ If https://github.com/git/git is a Git repository (and it is), and we use git fetch to call it up, we can have our Git software get, from their (GitHub's) Git software, any new to us commits that exist at that location, that we do not have yet, and put those in our own repository. We can make a new, totally-empty repository and do this and get a clone of the Git repository for Git. Either way, we need to reach out to that URL and obtain their commits.

When we do do all this, our Git will have their Git list out their branch and tag and other names—you can see those using git ls-remote, if you like—and we'll see stuff like this:

$ git ls-remote https://github.com/git/git | head -7
1ac7422e39b0043250b026f9988d0da24cb2cb58    HEAD
1ac7422e39b0043250b026f9988d0da24cb2cb58    refs/heads/main
d516b2db0af2221bd6b13e7347abdcb5830b2829    refs/heads/maint
1ac7422e39b0043250b026f9988d0da24cb2cb58    refs/heads/master
5071ed83ac8bbde44ad4946183a24c1d5f4f51e2    refs/heads/next
d119d638691f5865b3c2d5b4ee8659a69de206e8    refs/heads/seen
a081b42c76d3d342adf132d17c37a6374b4631bb    refs/heads/todo

Our Git's job is to examine each commit hash ID (seen on the left) to see if we have that commit yet, and if not, to obtain that commit and its ancestry—the history stored in the Git repository over on GitHub—up to the point where we do have their commits. That way we get all their commits, plus any of ours that we have never given to them.

Having saved all those commits (and their supporting objects) in our Git repository, our Git moves on to its next job, which includes remembering their branch names. For git fetch, the way to do that depends on the second positional argument:

git fetch origin

This has no second positional argument, so Git falls back to using the one stored under the name origin:

$ head -9 .git/config
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[remote "origin"]
        url = https://github.com/git/git
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]

Note the fetch = line here. This sets the default if we didn't give a second positional argument. If we do give a second positional argument (and/or any additional ones):

git fetch origin +refs/pull/123/head:refs/heads/pr123

our Git will ignore the fetch = setting and use the second (and additional) positional argument(s) instead. (This mode also limits what gets fetched: our Git may ignore some of their new commits, if we're not asking for everything, as we normally do.)

It's this default +refs/heads/*:refs/remotes/origin/* that causes our Git to take their branch names, like master or main, next, seen, and todo for instance, and turn those into our origin/master, origin/main, origin/next, and so on. The first * has our Git software match each refs/heads/ name, and the second * has our Git create or update a corresponding refs/remotes/origin/ name. These origin/* names are remote-tracking names.⁵ They explain why we can refer to origin/main or origin/master to mean the commit that the other Git repository had at the tip of their branch, the last time we ran git fetch to update everything.

¹While researching this a bit, I found some incorrect history here:

The original Unix shell was written in the mid-1970s by Stephen R. Bourne while he was at the AT&T Bell Labs in New Jersey.

The Bourne shell supplanted several previous shells, including the Thompson and PWB ("Mashey") shells. Wikipedia has the correct history.

²The exact details here can get convoluted due to POSIX-vs-GNU differences in philosophies. The main thing to remember (rather than looking it up every time) is that there are short options, -m for instance, that consist of a single letter, and long options like --hard that consist of multiple letters. A long option gets a double hyphen and a short option gets a single hyphen. Multiple short options can be combined: if a command has -a, -b, and -c as options, -abc specifies all three. If -b and -c take arguments, then instead of cmd -a -b bval -c cval you can write cmd -abc bval cval. Long options can't be combined this way.

In the case that some argument resembles an option, you can use -- as an argument to separate the argument from the options. For instance, if you have a file named -r, rm -r (or git rm -r) does not work because the -r here is assumed to be an option rather than the name of a file. But rm -- -r (or git rm -- -r) tells the command that the -r is not actually an option after all.

Fortunately, for file names you can add ./ in front: rm ./-r removes the file and the argument no longer starts with -. But this does not work for all cases: for instance, suppose you have a branch named readme and a file named readme? Now git checkout readme becomes ambiguous: did you mean to git checkout the branch readme, to switch to the given branch, or did you mean git checkout [implied HEAD here] the file named readme from the current commit? To disambiguate, run git checkout readme -- to force Git to see readme as a branch name, or git checkout -- readme to force Git to see readme as a file name. (Or use git switch and git restore, which are better designed and do not have this problem in the first place!)

In the end, the gory details about what gets treated how, by which command, can only be found in each individual command's documentation. The general rules here mostly work but each command can have its own exceptions. There's no shame in having to go back to the documentation now and then.

³It should be short, but that's up to you. If you want to name your remote pneumonoultramicroscopicsilicovolcanoconiosis instead of origin, you can, but I wouldn't.

⁴Git URLs include normal URLs but are extended in various ways. See the git fetch documentation for details. But if we assume that the URL is, say, https://github.com/git/git for instance, that's 26 characters you would have to type in every time, vs 7 for origin. For me, it's not so much the length as the ease of typos and the potential for disaster, though. (Consider what happens with existing web pages and typosquatting, and how that could be used with Git repositories.)

Sometimes the "URL" is just a path on your computer, and then it might be short and easy to get right without typos, in which case using a remote name isn't necessary. But using origin anyway gives you some advantages: specifically, it adds the ability for Git to use remote-tracking names. (The advent of added working trees reduces the need for any of this, though.)

⁵Git calls these remote-tracking branch names, but they are not actually branch names. You can see this because git switch refuses to use them:

$ git switch origin/master
fatal: a branch is expected, got remote branch 'origin/master'

This error message is poor—it's taken remote-tracking branch name and shortened it to remote branch, which is even worse, as names go, than remote-tracking branch name—but it means that origin/master is not a branch name. And that's true: all branch names begin with refs/heads/ and origin/master is short for refs/remotes/origin/master, which does not being with refs/heads/. (Think of the long versions as the full name, and the shortened ones as the familiar name we use, rather like "Alexander" becomes "Sasha" or "Robert Maximillian Vanderscnhoofler-Valdez III" becomes "Bob". The full name is just too unwieldy; we only use it if we really have to.)

Nonetheless, regardless of what class a name has—branch name, tag name, remote-tracking name, or whatever—each of these Git names stores one hash ID. For branch and remote-tracking names, those are always commit hash IDs. So if we're interested in a specific commit, either kind of name works fine: our name master means our latest master commit and our name origin/master means their latest master commit, as recorded by our recent git fetch that reached out to their Git and saw their master hash ID and got that commit from them if we didn't have it yet.

Tag names can also store commit hash IDs, or they can store other kinds of hash IDs. If the other hash ID can be resolved to a commit hash ID, Git can use a tag name to specify a commit. This allows us to use git switch --detach with a remote-tracking name or a tag name:

git switch --detach v2.15.0

for instance puts us in detached HEAD mode on the Git version 2.15 commit.

Refs and symbolic refs

With all that in mind, let's review the idea of a ref and introduce the idea of a symbolic ref. A ref or reference is one of these Git names: branch names, tag names, remote-tracking names, and so on. You can have your Git spill out all your refs (save for a few notable exceptions, that is) using git for-each-ref. By default, this prints, for each ref, the following information:

<hash-ID> <type> <full name>

If I do this in my Git clone for git, I get, in part—I've snipped a lot because there are tons of tags—this:

ab1f2765f78e75ee51dface57e1071b3b7f42b09 commit refs/heads/master
8dbdf339cd2e757143d9f222f662edd8ef745ea8 commit refs/heads/stash-exp
ab1f2765f78e75ee51dface57e1071b3b7f42b09 commit refs/remotes/origin/HEAD
ab1f2765f78e75ee51dface57e1071b3b7f42b09 commit refs/remotes/origin/main
4c53a8c20f8984adb226293a3ffd7b88c3f4ac1a commit refs/remotes/origin/maint
ab1f2765f78e75ee51dface57e1071b3b7f42b09 commit refs/remotes/origin/master
be66c8963cc046090ab0eb1f750e71f594a2a4e4 commit refs/remotes/origin/next
59abd1b7c59e685d149dc492125ecb84ec7582ac commit refs/remotes/origin/seen
3e940371c48370f0e8f636802dcb206c4a41a5bd commit refs/remotes/origin/todo
d5aef6e4d58cfe1549adef5b436f3ace984e8c86 tag    refs/tags/gitgui-0.10.0
33682a5e98adfd8ba4ce0e21363c443bd273eb77 tag    refs/tags/gitgui-0.10.1

The refs/heads/ ones are branch names, and the refs/tags/ ones are tag names, and the refs/remotes/ ones are remote-tracking names, as always for Git: these are pre-defined name spaces. You can invent more: Git refs begin with refs/, then the next level is the name-space—refs/stash is a little weird as it doesn't have another slash after it—and then everything after the second slash is the name in that namespace. Just be careful not to use one of the names that Git itself already uses, or will use someday in the future.⁶

The ones git for-each-ref fails to list are the special names, like HEAD, ORIG_HEAD, CHERRY_PICK_HEAD, MERGE_HEAD, and so on. These don't start with refs/ and some parts of Git seem to believe, in effect, that they're not refs at all.⁷ Some parts of Git refer to them as "pseudo-refs", but in general they mostly hold one hash ID too. But there's one glaring exception to all the rules, and that's HEAD.

The special name HEAD, written in all uppercase like this, is stored in a Git repository as a file named .git/HEAD, at least as of Git versions up through 2.36 so far (and probably for a long time yet). That is, the main working tree file is .git/HEAD; there's a per-work-tree additional file for each added work-tree. But this one—.git/HEAD—is definitely always there: if it's missing, Git will refuse to believe that the repository is a repository after all. So it must exist.

In the old days of Git, before it ran on Windows, this special file was usually a symbolic link or symlink. A Linux/Unix symlink is a special file type—a mode 120000 file in Git-speak, because Git stole the inode format bits directly from Unix/Linux inodes here—where the OS opens and reads the file's contents and treats those contents as another file's name. This makes an attempt to read the file's contents actually read the other file's contents, and an attempt to write the file's contents causes the OS to write to the other file.

As such, if .git/HEAD was a symbolic link to refs/heads/master, any attempt to use the data from .git/HEAD would actually refer to the data in .git/refs/heads/master, and writing a new hash ID to .git/HEAD would actually overwrite the hash ID in .git/refs/heads/master. So in primeval Git this is how HEAD-as-current-branch was implemented. This trick fails on Windows though (and also if we ever have reftables), so in modern Git, .git/HEAD is an ordinary file at all times and either contains a line like:

ref: refs/heads/master

(which says that we're on branch master, as git status would say) or contains a raw hash ID:

$ git switch --detach v2.15.0
HEAD is now at cb5918aa0d Git 2.15
$ cat .git/HEAD
cb5918aa0d50f50e83787f65c2ddc3dcb10159fe

In this case we're in detached HEAD mode:

$ git status
HEAD detached at v2.15.0
...

(output snipped because this caused a lot of currently-generated files to become untracked files, and there's no point listing those here).

When your HEAD is attached, it's a symbolic reference. Git has the git symbolic-ref command as a low-level ("plumbing") command to read and write such refs, but in general only HEAD really works well as one.⁸ When HEAD is detached, git symbolic-ref HEAD gets you an error—which tells you that you're in detached-HEAD mode—and then git rev-parse HEAD gets you the hash ID of the branch name that HEAD stores.⁹

In summary: HEAD is normally a symbolic ref. Whatever branch name it contains, that's the branch you're "on". As long as we can find out what branch name HEAD holds, we know which branch some repository is using (in its main working tree anyway).

⁶This of course requires that you hop into your time machine (or TARDIS) and go to the future to see what names Git has taken up. If you can do that, you're probably not reading this. 😀

⁷The ongoing "reftable" support for C Git has made this quite clear: these "top level" names have to be treated specially. They also are per-worktree, except for FETCH_HEAD, which really isn't a ref at all by any measure, but still needs to be parse-able as a commit hash ID in some cases. It's kind of ugly.

⁸In the past, I played with making another one named INDIR and seeing what happened while using it, and it behaved badly. The worst case was that trying to delete INDIR silently deleted its target branch instead. This appears to have been fixed at some point, so things are better, but I still wouldn't trust it very far.

⁹Note that there's one last special case: HEAD may be a symbolic reference to a branch name that does not exist yet. In this state, Git says that you are on an "orphan branch" or an "unborn branch" or a "branch that is yet to be created": once again Git is not very consistent about how it describes this. But the things to know are:

in a new, empty repository, there are no commits, so there cannot be any branches, as a branch name is required to hold a commit hash ID;
but you still have to be "on" the initial branch;
so Git has this mode of being "on" an unborn branch; and
when in this mode, the next commit you make will create the branch.

The new commit you make that creates the branch will be a parentless, or root, commit: the beginning of history. You can use git switch --orphan to get back into this "new repository smell" state, without actually making a new repository, but it's rare-ish to have a good reason to bother.

Now let's look at `git clone`

Suppose we clone a repository, such as the GitHub clone of the Git repository. We run:

git clone -c weird.name=value https://github.com/git/git

Our Git software, running the git clone command:

makes a new, empty directory named git, and runs the remaining operations in this directory;
runs git init in this directory, to create a new, totally-empty repository with no branches and no commits yet;
runs git remote add to add the name origin to store the URL https://github.com/git;
runs git config weird.name value (per our -c option)—this step is omitted if we don't supply any -c options to clone;¹⁰
runs git fetch origin, which gets all their commits and creates our remote-tracking names;
runs git switch in such a way as to create a branch, using one of the remote-tracking names set up in step 5 to get the commit hash ID to use.

The branch created in step 6 is now checked out as the current branch, so that HEAD is a symbolic reference to that branch. But which branch name does our Git use in step 6?

Git's answer to this question is that you supply the name with your git clone command: -b maint, for instance, would mean create maint from origin/maint. But you can leave out the name entirely, as I did above:

$ git clone -c weird.name=value https://github.com/git/git
remote: Enumerating objects: 326951, done.
[snip]
Resolving deltas: 100% (244288/244288), done.
$ cd git && git rev-parse --symbolic-full-name HEAD
refs/heads/master

We can see here that our Git chose the name master. Where did this come from? The answer lies in the output from git ls-remote:

$ git ls-remote --symref origin | head -5
ref: refs/heads/master  HEAD
1ac7422e39b0043250b026f9988d0da24cb2cb58        HEAD
1ac7422e39b0043250b026f9988d0da24cb2cb58        refs/heads/main
d516b2db0af2221bd6b13e7347abdcb5830b2829        refs/heads/maint
1ac7422e39b0043250b026f9988d0da24cb2cb58        refs/heads/master

This time, I added the --symref option, which caused git ls-remote to add one line to the top of the output:

ref: refs/heads/masterTABHEAD

This --symref option tells the other Git: If your HEAD is attached, tell me what branch name it's attached to. Since their HEAD is attached to their master, that's what their Git tells my Git. The git clone process uses git fetch—technically, git clone is built from the same source as git fetch so it's just subroutine calls rather than entire process operations, at this point—in such a way that it gets the --symref effect and can see which branch name their HEAD means.

In some (very old) versions of Git, the server may not support --symref or the client might not have the option. In either case that simply means that the ref: refs/heads/masterTABHEAD line is missing, and in that case, our Git software will read the hash ID associated with HEAD and then read the hash ID associated with each branch name. As you can see from the output above, though, this doesn't always work: HEAD is currently 1ac7422e39b0043250b026f9988d0da24cb2cb58, and so are both refs/heads/main and refs/heads/master. So in a case like this, our (client) Git won't know which branch they are recommending, and will just pick one to break ties.¹¹

¹⁰Don't confuse this with git -c weird.name=value clone, which supplies the -c option to git, rather than to clone. (This is why these things are confusing. I included the -c option just to have something happen in step 3, and I got it wrong when I did my sample command!)

¹¹You can supply the -b option to fix the tie-breaking problem, or just use a more modern Git, of course. Note that you can also supply a tag name to -b when cloning; in this case, your own client will create a repository with no branch names and put you in detached HEAD mode with the appropriate tagged commit checked out. Or, you can add --no-checkout to your git clone to skip the checkout, again leaving you in a repository with no branches. But most people mostly don't do any of these, so their Git client software uses the --symref result to find the right branch name to create.

Putting these together gets us most of the way there

Let's take some notes now before we dive into the last parts:

git clone uses git ls-remote --symref (or the logical equivalent) to find out what their—the other Git's—HEAD is set to, i.e., which branch they have checked out.¹²
Your Git therefore knows what their Git's HEAD is, or was at least, at the time you ran git clone.
A symbolic ref lets some ref or pseudoref, such as HEAD, refer to (or point to) some other ref, such as refs/heads/main or whatever. The Git command git symbolic-ref dumps out the contents of such a ref (or creates or updates or deletes one depending on arguments and options).
So your Git can easily create a refs/remotes/origin/HEAD symref that points to whichever branch is (or was, at least) their HEAD—their current branch—at the time you ran git clone.

Your Git does exactly that, and that's why in the git for-each-ref output I showed above, there was a refs/remotes/origin/HEAD. That name is in fact a symbolic ref:

$ git symbolic-ref refs/remotes/origin/HEAD
refs/remotes/origin/master

So your Git can now, at any time from git clone forward, use origin/HEAD as a substitute code for "whatever branch they had as their HEAD". Using git rev-parse—which we have only shown by illustration—lets us see the symbolic name or the hash ID as well:¹³

$ git rev-parse --symbolic-full-name origin/HEAD
refs/remotes/origin/master
$ git rev-parse origin/HEAD
ab1f2765f78e75ee51dface57e1071b3b7f42b09

So the name origin/HEAD uses the same symbolic-ref resolution pattern as the name HEAD. But origin/HEAD links to refs/remotes/origin/master—a remote-tracking name, with a hash ID in it—while HEAD links to refs/heads/master—a branch name with a hash ID in it. The branch and remote-tracking names might or might not have the same hash ID, but they both do have some hash ID, because branch and remote-tracking names must store the hash ID of some existing, valid commit.¹⁴

¹²Technically, the other repository is probably a bare clone—but a bare clone still has a HEAD, and whatever symbolic name is stored in that bare clone's HEAD is the branch they end up recommending.

¹³You might wonder at this point what the difference is between git rev-parse --symbolic-full-name and git symbolic-ref. Try it with both HEAD and an actual branch name yourself, in some existing repository, and see! When used with a name that's not a symbolic ref, what happens with git symbolic-ref? What happens with git rev-parse --symbolic-full-name?

¹⁴The phrase "existing [and] valid" is redundant in Git, at least in its original design. The concept behind a partial clone deliberately inserts a tiny wedge between these, but I have run out of space to go into detail.

The final connection

I gave some of this away at the top, in the TL;DR section, but now it's time for the final reveal, as it were. When we use a branch name where Git wants a raw hash ID, Git runs git rev-parse on that branch name. That's the case for git reset --hard, so we're getting the effect of:

git rev-parse origin

which puts the name origin in the context of where a raw hash ID is needed. Whenever this occurs—which is pretty often; it even happens sometimes when a branch name is needed because after turning master or develop or whatever into refs/heads/master or refs/heads/develop, Git may also need the hash ID—Git uses a six-step process to convert the name to a hash ID. This six-step process is described in the gitrevisions documentation:

<refname>, e.g. master, heads/master, refs/heads/master
A symbolic ref name. E.g. master typically means the commit object referenced by refs/heads/master. If you happen to have both heads/master and tags/master, you can explicitly say heads/master to tell Git which one you mean. When ambiguous, a <refname> is disambiguated by taking the first match in the following rules:

If $GIT_DIR/<refname> exists, that is what you mean (this is usually useful only for HEAD, FETCH_HEAD, ORIG_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD);

otherwise, refs/<refname> if it exists;

otherwise, refs/tags/<refname> if it exists;

otherwise, refs/heads/<refname> if it exists;

otherwise, refs/remotes/<refname> if it exists;

otherwise, refs/remotes/<refname>/HEAD if it exists.

Now, most of these steps simply add something in front of the name: master gets tried as the pseudo-ref .git/master (step 1), then as refs/master (step 2), and so on until we hit step 4: refs/heads/master. But step 6 is different. Step 6, if we reach it, takes the name you give—bleeding-gums-murphy for instance—and sticks something on both ends, trying it as refs/remotes/bleeding-gums-murphy/HEAD.

Now suppose the name you give is origin. Git isn't treating this as a remote, it's just shoving it through the six steps. Git tries .git/origin and that doesn't exist, so it moves on to step 2, refs/origin. That doesn't work either, so it moves to steps 3 and 4—try it as tag name, then branch name—and those don't work so it moves to step 5, which tries it as refs/remotes/origin. That almost works except that the refs only start with refs/remotes/origin. "Almost" does not count here, so it moves on to step 6: try refs/remotes/origin/HEAD—and that one does work because there is a refs/remotes/origin/HEAD.

This refs/remotes/origin/HEAD name was created back when you ran git clone. It just sits there afterwards: a subsequent git fetch does not update it, even if the Git over at origin has a new default branch setting. But you can run git remote set-head to change what refs/remotes/origin/HEAD holds.

To see what you can do with git remote set-head, read the git remote documentation.

Upvotes: 1