xpt
xpt

Reputation: 22984

How to create git Remote-Tracking Branch

They said that is it as simple as

You can tell Git to track the newly created remote branch simply by using the -u flag with "git push".

But it never worked for me.

How to create the git Remote-Tracking Branch, with which

Git can now inform you about "unpushed" and "unpulled" commits.

Here is mine:

$ git status 
On branch newfeature/v4-json
nothing to commit, working tree clean

vs what I'm expecting, quoting from above article:

$ git status
# On branch dev
# Your branch and 'origin/dev' have diverged,
# and have 1 and 2 different commits each, respectively.
#
nothing to commit (working directory clean)

I.e., info about the "unpushed" and "unpulled" commits.
I.e., I want to see the same as:

$ git status
On branch master
Your branch is ahead of 'origin/master' by 3 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

Yet from my above actual output, you can see that i'm not able to see how many commits I've made so far anymore, despite that I've made several commits.

This is what I did:

$ git push -u origin newfeature/v4-json
Counting objects: 12, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (11/11), done.
Writing objects: 100% (12/12), 1.87 KiB | 958.00 KiB/s, done.
Total 12 (delta 9), reused 0 (delta 0)
remote: Resolving deltas: 100% (9/9), completed with 9 local objects.
remote: 
remote: Create a pull request for 'newfeature/v4-json' on GitHub by visiting:
remote:      https://github.com/.../pull/new/newfeature/v4-json
remote: 
To github.com:xxx/yyy.git
 * [new branch]      newfeature/v4-json -> newfeature/v4-json
Branch 'newfeature/v4-json' set up to track remote branch 'newfeature/v4-json' from 'origin' by rebasing.

But I don't have such remote tracking branch 'newfeature/v4-json' from 'origin' set up by git:

A) git remote show origin does not show a remote tracking branch for my newfeature at all:

$ git remote show origin
* remote origin
  Fetch URL: [email protected]:go-easygen/easygen.git
  Push  URL: [email protected]:go-easygen/easygen.git
  HEAD branch: master
  Remote branch:
    master tracked
  Local branches configured for 'git pull':
    master             rebases onto remote master
    newfeature/v4-json rebases onto remote newfeature/v4-json
  Local refs configured for 'git push':
    master             pushes to master             (up to date)
    newfeature/v4-json pushes to newfeature/v4-json (up to date)

while the following is what I want to see, according to http://www.gitguys.com/topics/adding-and-removing-remote-branches

$ git remote show origin
* remote origin
  Fetch URL: /tmp/.../git/rp0
  Push  URL: /tmp/.../git/rp0
  HEAD branch: master
  Remote branches:
    master     tracked
    newfeature tracked
  Local branches configured for 'git pull':
    master     rebases onto remote master
    newfeature rebases onto remote newfeature
  Local refs configured for 'git push':
    master     pushes to master     (up to date)
    newfeature pushes to newfeature (up to date)

Note in the Remote branches: section, besides master tracked, there is also a newfeature tracked. This newfeature tracked is called remote tracking branch as per above article.

B) neither is git branch -a:

$ git branch -a
  master
* newfeature/v4-json
  remotes/origin/HEAD -> origin/master
  remotes/origin/master

There is only one remotes/origin/master remote tracking name there, while I'm expecting more. E.g. (irrelevant but just to show the case with more remote tracking names),

$ git branch -a
* master
  remotes/origin/HEAD
  remotes/origin/master
  remotes/origin/v1.0-stable
  remotes/origin/experimental

C) nor is git branch -vv:

$ git branch -vv
  master             75369c3 [origin/master] - [*] allow ...
* newfeature/v4-json 8c98d9c - [*] update ...

while I'm expecting to see,

$ git branch -vv
  master             75369c3 [origin/master] - [*] allow ...
* newfeature/v4-json 8c98d9c [origin/newfeature/v4-json] - [*] update ...

Moreover,

git pull is not updating my local branch from remote either:

$ git pull
From github.com:xxx/yyy
 * branch            newfeature/v4-json -> FETCH_HEAD
Already up to date.
Current branch newfeature/v4-json is up to date.

$ git pull
From github.com:xxx/yyy
 * branch            newfeature/v4-json -> FETCH_HEAD
Already up to date.
Current branch newfeature/v4-json is up to date.

$ git pull
From github.com:xxx/yyy
 * branch            newfeature/v4-json -> FETCH_HEAD
Already up to date.
Current branch newfeature/v4-json is up to date.

I.e., no matter how many time I pull, I'm not getting the same output as,

$ git pull
Already up to date.
Current branch master is up to date.

All above is not normal. I've created Remote-Tracking Branch with MS VS before many times, and the results are exactly as what I am expecting, not above. However, I don't like the black magic tricks, so I want to know how I can do the same with plain git.

So what is the correct way to create git Remote-Tracking Branch?

Upvotes: 1

Views: 2923

Answers (1)

torek
torek

Reputation: 487725

Edit to address updated (git branch -a and git branch -vv) output: yes, something is missing. It's not entirely clear what went wrong, but I have a guess. This part of the git push -u output:

 * [new branch]      newfeature/v4-json -> newfeature/v4-json
Branch 'newfeature/v4-json' set up to track remote branch 'newfeature/v4-json' from 'origin' by rebasing.

shows your Git setting your origin/newfeature/v4-json (split into two parts) as your upstream for newfeature/v4-json. But your git branch -a and git branch -vv output show that origin/newfeature/v4-json is not there.

I can reproduce a key element of this behavior by making a single-branch clone. Using git clone --depth=number or git clone --single-branch will produce such a clone. The side effect of this is that your Git will never create any remote-tracking names for any branch other than the one branch you told Git that you were concerned with. If this is the problem, the fix is to convert the clone to a normal (multi-branch) clone. (If you used --depth to create the single-branch aspect, it may also be wise to unshallow the clone.)

To see if your clone of origin is set to be single-branch:

$ git config --get-all remote.origin.fetch

In a normal clone, this will print:

+refs/heads/*:refs/remotes/origin/*

In a single-branch clone with the branch master chosen, this will print:

+refs/heads/master:refs/remotes/origin/master

which tells your Git: create a remote-tracking name for master rather than the former's create remote-tracking names for *, i.e., all branches.

To un-do the single-branch-ness of a clone of origin:

$ git config remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'

(or edit .git/config directly, e.g., git config --edit, which is my preferred method). See also How do I "undo" a --single-branch clone?

To convert a shallow clone to a full (non-shallow) clone, just run:

$ git fetch --unshallow

Note that this operation is independent of single-branch-ness, despite the way git clone ties them together by default (you can override this at git clone time with git clone --depth=number --no-single-branch). There is no command-line test for shallow-ness in versions of Git before 2.15; in 2.15 or later, use:

git rev-parse --is-shallow-repository

but before then you have to test for the existence of the file .git/shallow:

if [ -f $(git rev-parse --git-dir)/shallow ]; then
    echo true
else
    echo false
fi

simulates git rev-parse --is-shallow-repository.

As an aside, there is a problem with the output you want to see. You say that you'd like to see newfeature as a branch on the remote—but that cannot happen as the name newfeature/v4-json needs to exist, which precludes the ability for newfeature to exist.

(Original answer below line.)


$ git push -u origin newfeature/v4-json

This worked exactly as you asked it to. Everything is just fine in the rest of the output that you showed. So it's not clear what you think is wrong; nothing is actually wrong. I'll address the other message you showed:

# Your branch and 'origin/dev' have diverged,
# and have 1 and 2 different commits each, respectively.

below.

What does all this mean? (Long)

It may help to review how Git works and some of Git's rather peculiar terminology. In particular, the phrase you're using—remote-tracking branch—is, in my opinion, a bad term, actively misleading. It is a Git term, so we should understand what people mean when they use it, but it's a bad term, which means that people misuse it, and if you're confused by someone's usage, it may be worth stepping back and considering these things again.

First, let's note that Git is really all about commits. Commits are Git's raison d'être; without commits, we wouldn't use Git at all. So let's look at what a commit is.

Each commit contains files, but it's not just a set of files. It's a snapshot, of all of your files as of the time you took the snapshot,1 but it also has some metadata: information about the stored data. The most obvious is the stuff you see in git log output: your name and email address, and the computer's idea of what day and time it was when you made the commit, along with the reason you saved for making the commit, i.e., your log message. These are all meant for you—or someone else—to use in the future: someday, perhaps tomorrow, perhaps months or years from now, you may look back at this commit you just made, and ask yourself: why the heck did I do that? The answer should be in your log message.

Because a commit stores files—as a snapshot, frozen in time, immutable, and living forever (or as long as the commit itself lives)—they're great for archival. Any time in the future, you can go back into the past and see exactly what you saved back then. You can't change it: it's in the past, fixed, frozen in time. Not even Git can change it, as we'll see in a moment.

In order to find a commit, Git needs a name. These names are not branch names! Or, more accurately, you can start out using a branch name, but that's not the name that Git needs. The true name of any commit is instead its hash ID. The hash ID of each commit seems random, but in fact, it's a cryptographic checksum of the entire contents of the commit, exquisitely sensitive to every single bit of data in that commit: all of the frozen snapshot, and also your name and the time-stamp and your log message. That's why you, or anyone, can't change a commit: changing anything changes the hash ID, and what you then have is a new and different commit. Nobody knows what the hash ID will be for a new commit until it's made. At that time, it gets a unique ID. No one will ever use that ID for any other commit! And no one can change anything in the commit: Git will know if you try because the ID won't match up any more.2

There's one or two last key pieces to this particular jigsaw puzzle. The first is that within each new commit, Git stores the hash ID—the true name—of the previous commit, as part of that metadata. That is, Git does not just save your name and the time and so on, but also saves the raw hash ID of the commit you used to make this new commit. Git calls this saved hash ID the parent of the commit. What this means is that each commit points to its parent commit, in a backwards-looking chain.

For instance, suppose we have just two commits A and B in a repository. A is the very first commit so it deliberately has no parent—it's a special case. But B was made from A, so B points back to A:

A <-B

If you extract commit B, do some work, and make a new commit C, the new commit automatically points back to B:

A <-B <-C

What this means is that Git only needs to know the apparently-random hash ID of the last commit. In this case that's commit C. If its actual hash ID is cba9876... or whatever, Git can use that to find the contents of C. Those contents include the actual hash ID of commit B. Git can then use that to find B, whose contents include the actual hash ID of commit A. Git can use that to find A, and A has no parent, so now, finally, Git can stop working backwards.

This process of working backwards from a branch tip commit like C, identified by a branch name, is crucial in Git. It's how history exists. The history in a Git repository is the commits, as connected by these backwards-pointing arrows. You start from the end and walk, one commit at a time, through history, to see where you can reach by following the parent arrows.

This is where the last jigsaw-puzzle piece enters the picture, when branch names and other such names show up. Let's take a pause and finish off the footnotes here, then dive into branch names and graph-drawing.


1Git actually makes the snapshot from the index, but we won't get into these details here, other than to say that what gets snapshotted—frozen in time, forever, for that commit—is whatever is in the index at the time, which is at least potentially different from what you can see in your work-tree where you do your work.

2Git actually does check this, whenever it seems convenient or appropriate. That automatically detects accidental corruption of a Git repository, as occurs when (e.g.) you try to store on in Dropbox—Dropbox sometimes goes around modifying files behind your (and Git's) back, and Git catches it. Unfortunately, there's rarely a good way to repair a corrupted repository—instead, Git tends to rely on the idea that Git repositories get replicated all over the place. You probably have a good copy somewhere else so you just throw this one out entirely.


Branch names find commit hash IDs

Any existing repository—well, any one other than a totally empty, fresh, new repository with no commits in it yet—has some set of commits. These commits form the backwards-looking chains we just saw, such as:

A <-B <-C

We—and Git—need some way to record the hash ID of the last commit in this chain.

The way Git achieves this is with what Git calls references or refs. There are many forms of refs, but the Big Three are:

  • Branch names, like master.
  • Remote-tracking names, like origin/master. (Git calls these remote-tracking branch names or remote-tracking branches, which I think is a bad name; I've switched to using remote-tracking names, which I think is harder to get wrong.)
  • Tag names, like v1.3.

They are actually all implemented by the same underlying techniques, but we'll just treat them as separate forms of name here. Branch names have a special property; all the other names lack this property.

What goes in one of these names is quite simple: it's just the actual raw hash ID of a Git object, typically a commit.3 So a branch name like master points to the last commit in the branch—commit C in this drawing:

A--B--C   <-- master

Note that the arrows that connect commits to each other come out of the child and point back to the (immutable) parent, giving us this backwards traversal method. We don't have to bother to draw them in. The arrows coming out of branch names, however, change.

When we add a new commit to master, Git automatically updates the name master to hold the new commit's hash ID. So if we create a new commit now, the new commit D will point back to C:

A--B--C   <-- master
       \
        D

but Git will immediately adjust master to point not to C but to D:

A--B--C--D   <-- master

Since D points back to C, we can still find all the commits: we start at the end, and work backwards as usual. C is now the second commit in this process instead of the first.


3Branch names must hold commit object hash IDs, while tag names are more flexible. We don't need to care about this here. Because remote-tracking names' values are copied from branch names, remote-tracking names also hold only commit hash IDs.


Branch names are private to each repository, but repositories talk to each other

Git is a distributed version control system. This means that each Git repository is a sort of self-contained island, with everything it needs local to that repository. If there are multiple branches with many commits, they are all in that one repository:

A--B--C--D--G--H   <-- master
          \
           E--F   <-- dev

To make Git really useful, we regularly use Git to exchange work with other Git users. To achieve that, we exchange commits. Their hash IDs are universal across all Gits everywhere, because of that cryptographic checksum trick. Given a snapshot and metadata, every Git everywhere will compute the same hash ID. So if my repository has commits A through H like this—remember that these single uppercase letters are standing in for unique, big ugly hash IDs—and I connect to your repository and you have commit H, your repository must also have the same commit as mine.

If you don't have commit H, I have a commit that you don't. If you have some commit I or J, you have a commit that I don't. Either way, our Gits can just exchange hash IDs to see who has what. Whoever is sending commits will send them, whoever is receiving commits will receive them, and the sender will give the receiver any new commits needed.

Let's say you are taking new commits from me. I have new commits I and J, and my new commit J has a name that remembers its hash ID. In my repository, I have this:

A--B--C--D--G--H   <-- master
          \
           E
            \
             I--J   <-- dev

For whatever reason, I don't have commit F that you have on dev. Instead, I have my I-J commits on my dev, after (shared) commit E.

This is where remote-tracking names come in

Your Git takes my commits I and J. My commit I has parent E. So your repository now has this:

A--B--C--D--G--H   <-- master
          \
           E--F   <-- dev
            \
             I--J   <-- ???

What name will your Git repository use to remember my commit I? It had better not use dev: if your Git makes your dev point to commit I, how will you ever find commit F again? Remember, it has an apparently-random hash ID. You'll never be able to guess it.

So, what your Git does is use remote-tracking names to remember my branches. Your Git does this:

A--B--C--D--G--H   <-- master, origin/master
          \
           E--F   <-- dev
            \
             I--J   <-- origin/dev

(assuming my master points to commit H).

The names origin/master and origin/dev in your repository are (your) remote-tracking names, remembering my master and my dev.4 Moreover, suppose you now query your Git, asking it to compare the set of commits reachable from dev vs those from origin/dev, in the ordinary walk-backwards method that Git uses.

Starting from dev, the commits you will visit are F, then E, then D, and so on back to A. Starting from origin/dev, the commits you will visit are J, then I, then E, then D, and so on back to A. Which commits are unique to which walk? How many commits do you reach from dev that you can't reach from origin/dev, and vice versa?

Count those out, and then compare to what your Git told you:

# Your branch and 'origin/dev' have diverged,
# and have 1 and 2 different commits each, respectively.

There's actually another piece missing from our jigsaw puzzle here which we'll just describe lightly in the last section when we talk about git push below.


4Git sometimes calls this tracking rather than remembering, but this is another place Git badly overuses a word. I've used it in the phrase remote-tracking, but at least here it's hyphenated and uses the word as an adjective modifying remote.


git push is different from git fetch

The process above, where your Git created remote-tracking names from the branch names found on the Git at origin, is specific to git fetch. It happens when you have your Git call up the Git at origin and bring their commits to you.

You can, of course, have your Git call up their Git at origin and send commits. That's the git push operation, and it's pretty similar. Your Git tells their Git about the commits that you have, that they don't. Let's draw some. We'll start with this:

A--B--C--D--G--H   <-- master, origin/master
          \
           E--F   <-- dev
            \
             I--J   <-- origin/dev

Now we'll run git checkout master and git checkout -b newfeature/v4-json, or the simpler:

git checkout -b newfeature/v4-json master

We now have:

A--B--C--D--G--H   <-- master, origin/master, newfeature/v4-json (HEAD)
          \
           E--F   <-- dev
            \
             I--J   <-- origin/dev

We've attached the special name HEAD to newfeature/v4-json to remember which branch name gets updated as we add new commits.

Now we'll create one new commit. It could be more than one, or even none, but let's just create one for illustration. The new commit gets some big ugly hash ID, but we'll just call it K here:

                 K   <-- newfeature/v4-json (HEAD)
                /
A--B--C--D--G--H   <-- master, origin/master
          \
           E--F   <-- dev
            \
             I--J   <-- origin/dev

Now we will have your Git call up the Git at origin, using:

git push -u origin newfeature/v4-json

Your Git dials up their Git and announces that you have commits K and H.5 They don't have K but they do have H so they have your Git send over commit K with its snapshot and metadata. Your Git can tell that since they have H they also have G and D and everything before that, so you only have to send them K and its contents.

Then, at the end, your Git asks them: Please, now, if it's OK, set your name newfeature/v4-json to point to commit K. Note that you don't have them set xpt/newfeature/v4-json or anything like that. You have them set their branch! They don't actually have a newfeature/v4-json yet, so it's quite OK for them to set one. So they do! They now have a newfeature/v4-json in their repository, pointing to commit K.

Your Git now creates your remote-tracking name origin/newfeature/v4-json, pointing to commit K, to remember their newfeature/v4-json, pointing to commit K.6 But that just means that your graph has one extra name in it, like this:

                 K   <-- newfeature/v4-json (HEAD), origin/newfeature/v4-json
                /
A--B--C--D--G--H   <-- master, origin/master
          \
           E--F   <-- dev
            \
             I--J   <-- origin/dev

Because of the -u option, your Git immediately also runs:

git branch --set-upstream-to=origin/newfeature/v4-json newfeature/v4-json

This sets the upstream setting for your branch newfeature/v4-json. Each of your branches can have one (1) upstream setting, and it's pretty typical to use it in just this way. See Why do I need to do `--set-upstream` all the time? for more.


5Your Git could tell them about F, but only would have if you had said git push origin dev here. Using git push origin newfeature/v4-json, with or without -u, you told your Git: Tell them about commits K, H, G, D, C, B, and/or A as needed. Your other unshared commits remain private, on purpose.

6Remember, due to the magic of hash IDs, commit K is universal across every Git everywhere. Every Git either has K, by its hash ID, and then it's that commit; or doesn't have K at all, so that it doesn't matter.

(This isn't necessarily 100% guaranteed. Suppose the hash ID of K is actually b5101f929789889c2e536d915698f58d5c5c6b7a. That's the hash ID of a commit in the Git repository for Git itself. If you never connect your Git repository to a Git repository for Git, it's OK that you and they have different commits with the same hash ID. But if you do ever connect your Git repository to a Git repository for Git, some not-so-great things happen. The short version is that you just don't get Git's commit and they just don't get yours: the two repositories simply cannot be combined at this point. That is probably completely fine with both you and the people who maintain Git. But see also How does the newly found SHA-1 collision affect Git?)

Upvotes: 4

Related Questions