Bob5421
Bob5421

Reputation: 9073

Need to understand branch tracking

I have written other questions about tracking but I think I have poorly explained what I do not understand. Here is a quick example in order to show my problem:

So, at this step, my computer doesn't know anything about my_server_branch.

I have read I should create a "tracking branch" on my computer with:

git branch --track my_server_branch origin/my_server_branch

I have not run this command. I simply typed (on master branch, on my computer):

git pull

And here is what i see in the console:

 * [new branch]      my_server_branch -> origin/my_server_branch

So git detects there is a new branch when i launch a pull. So, again, my question is: What is the advantage of tracking the branch if gits detects and do the stuff ?

There is certainly an advantage to track because if not, this command wont exists... but I do not see this advantage.

And if i type the git branch --track command, here is what i get:

error: the requested upstream branch 'origin/my_server_branch' does not exist

Upvotes: 1

Views: 71

Answers (3)

torek
torek

Reputation: 487805

There are a bunch of concepts here that you need to keep separate in your head; and then there is a terminology issue as well.

These are the concepts:

  • branch names like master;
  • remote-tracking names—often called remote-tracking branches—like origin/master;
  • the idea of an upstream; and
  • commits, which are identified by their hash IDs; hold files plus some metadata; and, via that metadata, form chains.

The first two—branch names and remote-tracking names—are pretty closely related, and, along with tag names like v2.1, are all grouped into one concept that Git calls a reference.

The terminology problem is that sometimes, some branches—some names like master—are said to be tracking. Other names like origin/master are called remote-tracking branches, which looks like it's the same thing. It's not! That's why I call the latter remote-tracking names, to avoid the word branch, and why I recommend that instead of the verb tracking, you think about branch names like master as either having an upstream or not having an upstream. This sidesteps the tricky word track (which has yet another meaning when applied to files in your work-tree).

Let's move on here to your actions:

Here is a quick example in order to show my problem:

  • I have created a gitlab project

  • I have cloned this project on my computer with git clone command

At this point, you have two separate repositories. There's one on the GitLab server computer, and one on your own computer. We'll call the GitLab one "theirs" and the one on your computer "yours", even though in a sense, they are both yours. Your repository is very similar to theirs, but not exactly the same: it's a clone, and it has a way to identify it as the copy instead of the original.

We'll come back to your repository, on your computer, in a bit. The next few steps all happen in their repository though.

  • Then, i have created a new branch in gitlab web interface (with the '+' button): my_server_branch

OK, so at this point, their Git repository has a branch that yours does not know about.

  • I have put a single file into this branch (also with gitlab web interface)

Technically, you can't put a file into a repository like this. What you did is add a new commit, with the new commit containing the file.

This matters because the way branch names work is to remember the hash ID of the last commit that goes in the branch. When you add a new commit, the hash ID stored in the branch name changes to remember the new commit. The new commit remembers the previous last-commit.

If we draw these out, using single uppercase letters to stand in for the actual commit hash IDs, we get a picture like this one for a simple three-commit repository with just a master branch:

A <-B <-C   <-- master

Here, the name master remembers the actual hash ID of commit C. That commit itself remembers the actual hash ID of commit B, which remembers the hash ID of commit A. So Git only needs to have the name master remember commit C's ID: the rest, it finds by looking at the commits themselves.

(We say that master points to C, and C points to B and B points to A. Since A is the very first commit ever made in the repository, it points nowhere: that's what tells us, and Git, that we can stop and rest. There's no previous history to examine. The commits are the history, and history goes C then B then A.)

To add a new branch to a repository, we typically pick some existing commit inside the repository and check it out, so that it's the current commit, then add a new name that points to that same commit:

A--B--C   <-- master, my_server_branch (HEAD)

Git needs to know which name to update when we make new commits, so Git attaches the special name HEAD (in all uppercase like this) to a branch name. If we're using the local computer (rather than the web interface), we would then create a file, use git add to add it, and run git commit to make the new commit. If we're using the web interface, GitLab pretty much does the same thing in their repository, it's just hidden behind their web interface. They end up with:

A--B--C   <-- master
       \
        D   <-- my_server_branch (HEAD)

although they may do it in such a way that they leave HEAD attached to master anyway. That's how GitHub would do it, for instance, without moving HEAD. In any case, since it's their HEAD, not yours, it's not all that important right now.

Clones get their own branches

Now it's time to look back at your own repository. When you ran:

git clone <url>

you had your computer make a new, empty Git repository, with no commits, no branches, basically nothing but the empty repository shell. Then you had your Git on your computer fill in that shell by grabbing all the commits from their Git. So if they had three simple commits:

A--B--C   <-- master

your Git got those three commits:

A--B--C

(the internal, backwards-pointing arrows are too annoying to draw, but they're still there: C points to B and B points to A).

The hash IDs all match: every Git in the universe will agree that whatever is inside commit C, that makes its hash ID C's hash ID. So your Git and their Git can tell which Git has which commits, just by looking at these hash IDs. But your Git still doesn't have any branches yet.

Your Git asks their Git what all their branch and tag names are, and they say: My master holds the hash ID for commit C. So your Git now creates, not master, but rather origin/master, pointing to commit C:

A--B--C   <-- origin/master

There are no commits and no branches left, so your Git is done copying. Your Git now does the last step of git clone, which is to run:

git checkout master

You can make your Git use some other name, and if you don't, your Git asks their Git which name to use; but that's the usual, common case: your Git tries to check out your master.

You don't have a master. And yet, this checkout succeeds anyway. The reason it succeeds is that their Git had a master and your Git copied that to your origin/master. So your Git, instead of just failing the checkout, says to itself: Hm, there's no master, but there's origin/master ... that looks a lot like master, I bet you meant that I should make master using origin/master. So your Git does that:

git checkout --track master origin/master

which creates your own master, and sets its upstream to origin/master. So now you have this:

A--B--C   <-- master (HEAD), origin/master

Your master branch exists now and has origin/master as its upstream.

Sidebar: if you're confused, rest assured, this is confusing!

The confusing way to put this is that your master branch (1) is now tracking (2) your remote-tracking branch (3, 4, 5) origin/master from remote (6) origin. Here, at points (1) and (5), both words or phrases use the word branch, but both mean something different. At (2) and (4) we have the word tracking, both meaning something different. At (3) and (6) we have the word remote, both meaning something different. You can see why I don't like these words, and prefer to call this the branch name master, with the upstream origin/master, with origin/master being a remote-tracking name associated with the remote origin. I still have to use the word remote twice, but "remote-tracking" is at least hyphenated.

One right and good way to get my_server_branch locally

Having made my_server_branch in their Git, and added commit D there, what you can do now is run the command:

git fetch

in your own Git on your computer. (You can use git fetch origin if you want to be explicit.) This has your Git call up their Git and ask it, again, for its list of branch names. This time they say: I have master, at commit C. I have my_server_branch, at commit D. Your Git says: Ah, I already have commit C, so no problem there. Give me commit D though. They do that, and now the conversation between your Git and their Git is done. Now your Git updates your origin/master to point to C—that's no change at all—and creates your origin/my_server_branch, pointing to new commit D. So now you have:

A--B--C   <-- master (HEAD), origin/master
       \
        D   <-- origin/my_server_branch

You can now run git checkout my_server_branch. As before, you don't have a my_server_branch at the moment, but instead of just failing, your Git will say: Aha, I don't have my_server_branch. But I do have origin/my_server_branch. I'll create my_server_branch, pointing to commit D. I'll set the upstream of my_server_branch to be origin/my_server_branch. Then I'll do the checkout you asked for. The result is:

A--B--C   <-- master, origin/master
       \
        D   <-- my_server_branch (HEAD), origin/my_server_branch

You almost never need to use git checkout --track

The only time you do need git checkout --track is when git checkout won't do the right thing for you. This happens in two cases:

  • Suppose you have more than one remote, e.g., if you have origin plus a second remote fred for getting stuff from Fred's repository. Suppose further that you have your own origin/hello copied from the branch hello on origin, and Fred has Fred's hello which is now copied to your fred/hello. If you try to git checkout hello, your Git will find two candidates—fred/hello and origin/hello—and not know which one to use. So now you can run instead:

    git checkout --track fred/hello
    

    if you really meant to use Fred's, or:

    git checkout --track origin/hello
    

    if you really meant to use origin's.

  • Or, if for some strange reason, you have, say, origin/my_server_branch, but in your repository, you want to call this bob_server_branch. Using git checkout my_server_branch gets you my_server_branch; and of course, using git checkout bob_server_branch tries to find origin/bob_server_branch. So here, you need the long form:

    git checkout --track bob_server_branch origin/my_server_branch
    

About git pull

The git pull command is shorthand for:

  • run git fetch; then, provided that succeeds
  • run a second Git command, usually git merge.

Since git fetch will (when run with the right options, anyway) create and/or update your origin/* remote-tracking names from origin's branches, it's the first half of git pull that made origin/my_server_branch for you.

The second command—the git merge, or, if you tell it to use git rebase instead, the git rebase—takes commits brought in by the first command and uses those to merge, or to rebase.

I don't like the git pull command, for a number of reasons, some of which are purely historical (git pull used to occasionally destroy your local work in a few rare but not unheard-of cases, and I had that happen to me at least once). The most practical objection is pretty simple though: until you see what git fetch fetched, how do you know whether you want to run git merge, git rebase, or something else entirely? So I prefer to avoid git pull: I run git fetch first, then maybe run a git merge or git rebase, or maybe do something else entirely. What to do depends on what I saw from git fetch (and also what I'm doing with this particular repository, of course).

There are a few, occasion exceptions, especially with repositories that I use read-only—I just want their latest commit plus its history, so git pull is probably fine, as long as they behave well—or where I control both ends, e.g., the origin repository is really mine over on GitHub, and I know what I put in there. But even for the latter case, I tend to avoid git pull, because sometimes I forget what I've put into which repository. Using git fetch lets me check first.

Upvotes: 2

Vladyslav Zavalykhatko
Vladyslav Zavalykhatko

Reputation: 17354

git pull runs git fetch and git merge. Just nice to know

With the first part of this command (git fetch) you fetched all the branches from the remote. As you can read here, this command fetches all the refs.

By default, when you check out a new branch from remote, git will create it as a new upstream branch. Basically, the command you are asking about was run for you implicitly, when you fetched all the branches.

Upvotes: 1

ray
ray

Reputation: 27245

I'm not 100% sure I understand what you're asking, but the advantage to tracking a remote branch is that you can push to and pull from it.

Check out the branch locally: git checkout my_server_branch

Make changes and commit.

Now git push will push your changes to the remote origin/my_server_branch branch.

Upvotes: 1

Related Questions