Remote tracking branches confusion

Question

I've been playing around with http://git-school.github.io/visualizing-git and I'm not really sure how this works - can I delete remote tracking branches? Say I have local branches master and origin/master and a remote repository with master branch that corresponds to the local origin/master.

Can I delete origin/master ? If I can and I do so, how do I set up a new remote tracking branch for it again? Would just fetching origin automatically create it again? If someone pushes some new branch onto the remote repository say feature, will fetch always download this automatically and create a remote tracking branch origin/feature in my local repository? Does fetch always download "everything on remote repo that you're missing"?

Lastly, I know you can set what remote tracking branch a local branch tracks, say git branch -u origin/feature (assuming I have feature checked out) will associate feature with origin/feature, both local branches. In this case we call origin/feature the upstream branch. But can I change which remote branch origin/feature is associated with, and is this association also called "upstream" ?

I'm mostly just curious and I haven't been really able to recreate the remote tracking branch on the site I linked, after I tried deleting it. But maybe it's as simple as "fetch always creates a new remote tracking branch if it doesn't exist in your local repository".

torek · Accepted Answer

... can I delete remote tracking branches?

Yes, but there is little point (not quite no point, just "little"). The command-line command to do this is, e.g.:

git branch -r -d origin/master

(though you might have to force the delete in some cases).

Let's be overly explicit here and define remote-tracking branch name (or as I prefer to call it, remote-tracking name) carefully first, just in case. A remote-tracking name is a name that exists in your repository, but whose full name—as shown by, e.g., git rev-parse --symbolic-full-name or git for-each-ref—starts with refs/remotes/ (and goes on to include the name of a remote such as origin and another slash). These routinely get abbreviated to things like origin/master (though for some reason git branch -a abbreviates them as remotes/origin/master instead—compare to git branch -r which uses the shorter form).

If I ... do [delete origin/master], how do I set up a new remote tracking branch for it again? Would just fetching origin automatically create it again?

In general, yes.

If someone pushes some new branch onto the remote repository say feature, will fetch always download this automatically and create a remote tracking branch origin/feature in my local repository?

In general, yes.

Does fetch always download "everything on remote repo that you're missing"?

Mostly.

Lastly, I know you can set what remote tracking branch a local branch tracks, say git branch -u origin/feature (assuming I have feature checked out) will associate feature with origin/feature, both local branches.

Yes, though be careful about the phrase "local branches": origin/feature is a name specific to your repository but Git tends to call it a remote-tracking branch name, which leads to casual (mis?)use of the word branch, which is why I like to just drop the word branch entirely here now and call it a remote-tracking name. That way instead of being tempted to call it a "local branch", you'll be tempted to call it a "local name", which I think is much clearer.

In this case we call origin/feature the upstream branch.

I prefer to just call it the upstream (again, trying to avoid beating the poor word branch to death :-) ).

But can I change which remote branch origin/feature is associated with ...

No, or not really: a remote-tracking name has no association, at least not of this kind.

and is this association also called "upstream"?

Without the association, we don't need a term for it. :-)

How this works under the hood

When you have your Git call up another Git, you supply a set of refspecs—optionally on the command line, but you always supply them. This is true even if you use a raw URL. The general form of a command-line git fetch command is:

git fetch [] [ [...]]

That is, there are optional options arguments (like --tags or --no-tags, and/or --prune, and so on), an optional repository argument, and optional refspec arguments. To supply a refspec argument, you must first supply the repository argument, as these are positional arguments: the first non-option argument is the repository, and subsequent non-options are refspecs. So:

git fetch origin

supplies a repository argument and no refspec arguments, and:

git fetch origin '+refs/heads/*:refs/remotes/origin/*'

supplies one repository and then one refspec.

The repository argument can be a URL, but these days, generally should be a remote. A remote is simply a short name like origin. Git stores at least two items under this short name:

remote.remote.url contains the URL that Git can use for fetch and push operations;
remote.remote.pushurl (if set) contains an alternate URL Git should use instead for git push; and
remote.remote.fetch contains the refspec arguments that Git should use, if you don't provide any on the command line.

All these settings are stored in your configuration, as shown by git config -l for instance.¹ The act of creating a remote—via git remote add, for instance—will automatically create both the URL and fetch settings for that new remote. When you run git clone, Git creates a remote, as if by git remote add, so that too sets the two usual settings. The default name of this automatically-created remote is origin, so that's why a Git repository usually has an origin: most Git repositories tend to be created via cloning. Even those that aren't tend to have a git remote add origin run in them at some point.

Note that if you don't supply a remote or URL repository argument, git fetch will construct one: it will take the current branch's remote setting (git config --get branch.branch.remote, where branch is the current branch) to find the remote to use, or, as a last-ditch fallback, just use the hardcoded string origin. So the default git fetch action is to find the correct remote name, or use origin, then use the remote.remote.url and remote.remote.fetch settings from there.

One way or another, then, you've run git fetch or git fetch origin and supplied one or more refspecs. It is the refspecs that determine what remote-tracking names will be created. The default refspec for the remote named origin is:

+refs/heads/*:refs/remotes/origin/*

We can disassemble this refspec into its component parts:

an optional leading plus sign +, followed by
a source and a destination separated by a colon : character

where either source or destination (but not both, at least not sensibly) can be omitted. Both the source and destination parts can use an asterisk * in a way that's similar to, though not precisely the same as, a shell glob.²

The leading plus sign, if present, sets the force flag for this refspec. This force flag is the same flag you get with --force, except that the --force option sets it for the duration of the entire git fetch operation, while the plus sign sets it only for the duration of this one particular refspec.

So, the default origin refspec:

sets the force flag;
asks for sources that match refs/heads/*; and
uses a destination of refs/remotes/origin/*.

This destination is precisely the set of remote-tracking names for the remote origin, and that's where origin/master comes from. The process is a bit convoluted, though.

At the start of the conversation your Git has with the other Git—the one at the URL—their Git lists out all their branch and tag names, plus any other refs/* type names: all their refs or references.³ This list comes with hash IDs, because each ref always stores one hash ID. It's slightly augmented for tag refs (refs/tags/*). To see exactly what their Git spills out that this point, you can run git ls-remote origin, which does this first fetch step: call up the other Git and have it list out its refs. Then instead of fetching, git ls-remote just prints the list of refs.

Now that your git fetch has its paws on their refs, now your Git goes to apply the refspecs. Which of their refs match your refspecs? Those are the ones that your Git will inspect more closely.⁴

At this point, your Git inspects the hash IDs they gave you. Hash IDs are the universal currency of Git exchanges, because every Git in the universe agrees that any one particular hash ID is going to apply to that one particular object.⁵ Either you have the object already, in which case you have that hash ID in your own repository too, or you don't, in which case you don't. If you don't have the hash ID and do want some commit here, your Git tells their Git that, yes, it wants that hash ID. Assuming this is a commit, or annotated tag object—most of these are; see footnote 5—their Git will offer its parent(s) or, for a tag, its tag-target, and your Git can again say whether it wants the object, or not.

This process—the exchange of hash IDs, and "want" vs "already have" kind of responses—makes up the second phase of git fetch. Eventually your Git has told their Git which objects—commits and any necessary files to go with them—that they should package up and send; and now your git fetch, and their end, go into a third phase, of building what Git calls a thin pack. This is where you see "counting objects" and "compressing objects" and so on (if you do see them at all—this stuff is run off a timer and some of it is suppressed in some cases).

Finally, they send you this thin pack. Your Git takes the thin pack and "fattens" it into a regular pack, or otherwise incorporates the objects into your own repository. You now have all the objects you need from them, along with all of the hash IDs that correspond to all of the names your Git got from their Git. So if their refs/heads/master—their master branch—names commit a1234567..., and you didn't have a1234567... before, well, now you do. You also have the parent commits, and their parents, all the way back to the dawn of time, if needed.⁶ Typically, though, their a1234567..., if new, is only new for a few commits in length, after which the parent chain leads back into something you got from them yesterday, or whenever—so instead of fetching thousands of commits, you just fetch one, or three, or a dozen, or whatever.

In any case, the conversation with their Git is now done. Your Git has all the objects (commits and associated files) that your Git needs, along with the list of their branch names. Your Git now creates or updates your remote-tracking names via the refspec you supplied, either on the command line, or implicitly in via your configuration.

¹Normally these should be in the --local level, although Git itself doesn't care where they come from: it's just weird to set these in your system or global config. The URL and (if specified) push-URL are "last setting overrides" style configuration entries, but the fetch lines are cumulative settings. That is, if you've set these rather nonsense settings:

git config remote.origin.fetch foo:bar
git config --add remote.origin.fetch baz:quux

then git fetch origin acts like git fetch origin foo:bar baz:quux. So adding a remote.origin.fetch setting to your --global configuration would add to the standard setting, and this is potentially useful, but also potentially hazardous: you'll need to think hard about doing it.

²The degree of similarity depends on your Git vintage, as some restrictions were lifted in early 2.x versions.

³More precisely, their Git lists references that have not been marked hidden. Normally there are no hidden refs anyway, though.

⁴The process is modified a bit for tags, because --tags and --no-tags are not the default, and the default is kind of weird and surprising, but is why the tag information that the other Git hands over is augmented in the first place. I won't go into details here though.

⁵You mostly interact with hash IDs when talking about commit objects. These are just one of four internal object types, but they're the most important here, and branch names, such as refs/heads/master, are constrained: they must contain only commit hash IDs, not tree or blob or annotated-tag object hash IDs. However, internally, git fetch has ways of dealing with tree and blob hash IDs as well, to avoid re-sending file content that you already have, for instance. The details are well out of the scope of this answer.

⁶All of this gets modified, if desired, via what Git calls a shallow clone. In a shallow clone, some specific commits are omitted, which allows omitting all the history that comes before those commits. Shallow clones have some restrictions. The details again depend on exact Git vintage, though most of the strongest restrictions were lifted by Git version 2.0.

This is where all the caveats and "in general" items above come from

Using the standard remote.origin.fetch, then, this is where your Git creates or updates your origin/master based on what their Git said about their master. If you have a standard fetch setting, your Git will take all of their branches and create-or-update all of your remote-tracking names, using this one-to-one correspondence: their master becomes your origin/master; their feature becomes your origin/feature.

The mapping is determined by the refspecs, though. So you can create a single-branch clone, and in this single-branch clone you'll have:

remote.origin.fetch=+refs/heads/master:refs/remotes/origin/master

for instance. Now your Git only matches their refs/heads/master (plus some cases of tags, but see footnote 4). So you only get your origin/master created-or-updated.

To de-single-branch-ize this clone, you can simply change the default refspec. Or, to fetch two branches, but still just those two, you can add a second remote.origin.fetch line:

remote.origin.fetch=+refs/heads/dev:refs/remotes/origin/dev

Now, while a remote-tracking name has no upstream setting—the upstream setting of a (regular, local) branch is in its branch.branch.remote and branch.branch.merge settings, and there's nothing equivalent for remote-tracking names—it is possible to set up a wildly convoluted set of refspecs. It's not a good idea, though.

Note how we mentioned above the concept of doing a one-to-one mapping from their Git's names (refs/heads/*) to your remote-tracking names (refs/remotes/origin/*). If you do this:

remote.origin.fetch=+refs/heads/master:refs/remotes/origin/master
remote.origin.fetch=+refs/heads/master:refs/remotes/origin/master2

you would get two remote-tracking names from one source. Or, with:

remote.origin.fetch=+refs/heads/master:refs/remotes/origin/master
remote.origin.fetch=+refs/heads/dev:refs/remotes/origin/master

you would get one remote-tracking name from two sources.

This is bad, because it means the mapping is not reversible. If we want to go from origin/master to "the branch name they use over on origin", is that master or dev? Or, if we want to go from master-on-origin to our remote-tracking equivalent, is that master or master2?

In some ambiguous cases, Git will just give up and do nothing. Moreover, you can use --prune, or set the option fetch.prune to true, and in this case, after handling:

+refs/heads/*:refs/remotes/origin/*

your Git will comb through any refs/remotes/origin/* names that you have that weren't created-or-updated-or-at-least-refreshed by this git fetch operation, and remove them. This doesn't work right without a bijection: the algorithm is basically "do an injection, then remove untouched names if the injection was surjective".

Without --prune, your Git just leaves these "stale" remote-tracking names behind. That's why there's little, but not no, point to removing remote-tracking names. If you don't use -p or --prune or set fetch.prune to true, you may accumulate these stale branches. Using git branch -r -d will allow you to delete them. If you delete some by mistake, a subsequent git fetch will restore them, assuming a normal fetch setting.

I just do a git config --global fetch.prune true to set it as the default, though.

Remote tracking branches confusion

Answers (1)

How this works under the hood

This is where all the caveats and "in general" items above come from

Related Questions