Reputation: 3195

strange issue with git pull from origin branch

A colleague of mine pushed a branch (BranchA) to the repo.

I have then created a copy of this branch (testBranch) from BranchA.

Everything is all well and good.

The colleague then pushed up two further commits to BranchA.

I have then git pull ( to get the latest changes in from the repo)

However, I do not see the two files committed.

Repo

BranchA

Local

git checkout master
git pull
git checkout testBranch origin/BranchA
git merge master

I am not sure why I do not get to see the latest commits(files)

Workaround:

Deleted the local branch
created a fresh branch
Checkout master
Git pull
Checkout new branch
git merge
I now see the further two commits

I feel as though I am missing a step in here? It feels weird, that I would have to delete the branch each time I am required to get a latest changes from a particular origin/branch

Upvotes: 0

Answers (2)

torek

Reputation: 488193

You're ascribing too much magic to branches. :-)

The way Git works is really remarkably simple. A branch name is simply a name for a single Git commit hash ID. (I also advise that you forget that git pull even exists, but we'll see what it is, soon, and how to use it.)

About commits, hash IDs, branch names, and commit chains

Let's talk about these commit hash IDs a bit. A hash ID is a big ugly string of letters and digits, such as 0d0ac3826a3bbb9247e39e12623bbcfdd722f24c. This uniquely identifies some Git object—typically a commit, and when we work with branch names, it's always, definitely, a commit. Each commit records the hash ID of its parent, or predecessor commit. This allows Git to string commits together into a backwards-looking chain.

What this means is that we can draw these commit chains. If we let a single uppercase letter stand in for the big ugly hash ID, we get something that looks like this:

... <-F <-G <-H   <--master

The name master holds the actual hash ID of commit H. That lets Git find G, in the sea of commits floating inside the repository. From H, Git can get the hash ID of G, which is H's parent. So now Git can find G. Using G, Git can find F, and so on, backwards, down the line. The arrows here can be read as points to: master points to H, H points to G, and so on.

The contents of each commit are completely, totally, 100% frozen / read-only. Nothing inside any commit can ever change. So we don't really need to draw the internal arrows. However, branch names do change. The way Git adds a new commit to master is to write out a commit object, storing H's hash ID in the new object, along with the new commit snapshot and any other metadata like your name and email address and log message. This produces a new hash, which we'll call I rather than trying to guess it:

...--F--G--H--I

and now Git simply needs to write the hash ID of I into the name master, so that master now points to I:

...--F--G--H--I   <-- master

If you have more than one branch, or if you have multiple remote-tracking names like origin/master and origin/BranchA, we just draw them all:

...--F--G--H   <-- master, origin/master
            \
             I--J   <-- origin/BranchA

(We'll talk more about remote-tracking names in a moment. They are kind of like branch names, but with a twist.)

When you create a new branch name, all Git has to do is make the new name point to some existing commit. For instance, let's create our own BranchA now, using git checkout BranchA:¹

...--F--G--H   <-- master, origin/master
            \
             I--J   <-- BranchA, origin/BranchA

Now let's create testBranch as well, also pointing to commit J:

...--F--G--H   <-- master, origin/master
            \
             I--J   <-- testBranch, BranchA, origin/BranchA

If you create a new commit now, your Git needs to know which branch name to update. So your Git has this special name, HEAD, written in all-capitals like this.² Git attaches this name to one of your branch names:

...--F--G--H   <-- master, origin/master
            \
             I--J   <-- testBranch (HEAD), BranchA, origin/BranchA

which means that testBranch is the current branch, and is therefore the name that Git will update when you run git commit to make a new commit. One of the things git checkout does is to manage this HEAD-attachment.

¹Since you don't have a BranchA, you might think: How can I check it out? In fact, you should think that: it's a really good question. The answer is that your Git will create your own BranchA from the remote-tracking name. That's why you had to git checkout -b testBranch but not git checkout -b BranchA: the -b flag says create, and without it, Git will only create if the name doesn't exist and there's a remote-tracking name that does exist that looks right. There's more to it than this, but that's a good start.

²Due to a quirk, you can usually use lowercase head on Windows and MacOS, but not on Unix-like systems like Linux. It's advisable to avoid this habit, since it won't work on Linux: if you don't like typing HEAD in all caps, use @, which is a synonym for the magic name.

Remote-tracking names, or, what happens when someone makes commits in some other Git repository?

The thing about these branch names is that they're specific to your Git repository. Your master is your master. Your BranchA is your BranchA and your testBranch is yours, too. They won't change unless you change them.

In fact, even your remote-tracking names—origin/master and origin/BranchA—are yours too, but what makes them remote-tracking names is that your Git will automatically change them, to remember what your Git sees in some other Git, whenever your Git calls up their Git and asks them about their branch names. That is, your Git has the URL for some other Git repository, listed under the remote name origin: origin is a short name for some long, maybe-hard-to-type URL. You can run:

git fetch origin

and your Git will call up their Git, at the URL listed under origin, and ask their Git about their branches. They'll say: Oh, sure, here you go: my master is <hash1> and my BranchA is <hash2>. (To see this, run git ls-remote origin, which is like git fetch origin except that after getting the listing of remote names and hashes, it just prints them out.)

With this list in hand, your Git goes on to ask their Git for any new commits they have that you don't. So if they've updated their BranchA, you get their new commits. Then, regardless of what else has happened, your Git now sets all of your remote-tracking names that start with origin/. That is, suppose their had two new commits. Your own repository now looks like this:

...--F--G--H   <-- master, origin/master
            \
             I--J   <-- testBranch (HEAD), BranchA
                 \
                  K--L   <-- origin/BranchA

Your own BranchA and testBranch have not moved. These are your branches, so they only move when you move them. Your origin/master hasn't moved because their master hasn't moved, but your origin/BranchA has moved, to remember new commit L that you just got from them, because their BranchA did move, and now points to this same commit L.

(Remember, our uppercase letters stand in for actual big ugly unique hash IDs. If they made new commits, and you've made new commits, Git guarantees that their new hash IDs are different from every new commit hash you've made! You can see that with an active repository, single uppercase letters would run out way too fast, and be too hard to make unique. But they're a lot easier to draw and make it easier for us to talk about the commits, so that's why I use them here.)

Making your branch names move

Now that they've updated their BranchA, you might want to have your own BranchA move too. This is where things can start to get complicated, but let's look at an easy way to do that.

We'll start by running git checkout BranchA again. This will attach HEAD to BranchA, so that Git commands that use the current branch are using BranchA. Then we'll use git merge, which in this case, doesn't actually do any merging!

git checkout BranchA
git merge origin/BranchA

Before the git merge, we have this in our repository:

...--F--G--H   <-- master, origin/master
            \
             I--J   <-- testBranch, BranchA (HEAD)
                 \
                  K--L   <-- origin/BranchA

The git merge looks at origin/BranchA and finds that it's pointing to L. It looks at our current branch—the one HEAD is attached to—and finds that it's pointing to J. It realize that, by starting at L and working backwards, it can get straight to J. This means that the branch name BranchA can be "slid forwards", as it were, against the direction of the internal, backwards-pointing arrows. Git calls this operation a fast-forward. In the context of git merge, it's more like a git checkout that moves the current branch name. That is, commit L becomes the current commit, but it does so by moving the name BranchA. The result is:

...--F--G--H   <-- master, origin/master
            \
             I--J   <-- testBranch
                 \
                  K--L   <-- BranchA (HEAD), origin/BranchA

You now have commit L as your current commit, and commit L is filling in the index and the work-tree. It's time to talk a little bit about these two.

The index and the work-tree

We already mentioned that files stored inside commits are completely, totally, 100% frozen / read-only. They're stored in a special, compressed, Git-only format. This lets Git save a lot of space, and re-use unchanged files: if a new commit has mostly the same files as the previous commit, there's no need to save all the files. The old commit's copies are frozen, so the new commit can just share them. (The details by which this process works don't really matter here, but Git uses hash IDs, with what Git calls blob objects, to achieve this trick.)

That's great for Git, but we can't use frozen compressed Git-only files to do anything else. So Git has to thaw out and de-compress the frozen files, into their normal everyday form, so that we and the rest of the programs on our computer can use them.

The thawed-out files go into the work-tree, which is called that because that's where we work on them. Here, we can do anything we want with our files. So, for each file, there's a frozen copy in the current commit, and a thawed copy in the work-tree. (There may be frozen copies in other commits too, but the one in the current commit is the most interesting, since we can and will often compare it to the one in the work-tree.)

The index, which is also called the staging area or sometimes the cache, is a peculiar thing, unique to Git. Other version control systems also have frozen commits and thawed work-trees, but either don't have an index, or keep anything index-like totally hidden so that you don't need to know about it. Git, on the other hand, will, now and then, whack you in the face with the index. You must know about it, even if you don't use it for fancy tricks.

What the index holds is, essentially, a copy of each file. That is, each file in the current commit is also in the index. The index copy is in the special Git-only format. Unlike the frozen commit copy, though, this one is only semi-frozen—kind of slushy, if you will. You can replace it any time with a new, different, Git-ified and semi-frozen copy. That's what git add does: it Git-ifies the work-tree copy of the file, compressing it into the Git-only format and replacing the previous index copy. (If the new one matches any old one, in any frozen Git commit, it winds up re-using that old one: saving space! Otherwise it's a new Git-ized copy.)

Making a new commit, in Git, just needs to flash-freeze these index copies. They're all already ready for that, which is a significant part of why git commit is so much faster than other version control systems. But it also means that the index can be described as what will go into your next commit. Git builds new commits from the index, not from the work-tree.

You need the work-tree to work on your files. Git needs, and uses, the index to make new commits. The index and work-tree copies can differ; it's part of your job to git add the work-tree copies, to overwrite the index copies with updated ones, before committing.

Updating your `testBranch`

With all that out of the way, let's look now at updating your testBranch. Remember, we ran git fetch to update all our origin/* names, then git checkout BranchA and git merge origin/BranchA to update BranchA, so that we now have this:

...--F--G--H   <-- master, origin/master
            \
             I--J   <-- testBranch
                 \
                  K--L   <-- BranchA (HEAD), origin/BranchA

We now need to git checkout testBranch to attach HEAD to it. Then we can run git merge BranchA or git merge origin/BranchA:

git checkout testBranch
git merge <anything that identifies commit L>

The idea here is to make Git look at commit L. The merge command will then see whether or not it's possible to do the same fast-forward operation it did for BranchA. The answer will be yes: it's definitely possible to go from commit J straight to commit L. So by default, Git will do just that, and you will get this:

...--F--G--H   <-- master, origin/master
            \
             I--J
                 \
                  K--L   <-- testBranch, BranchA, origin/BranchA

Note that we can do this even if we never create our own BranchA, because instead of git merge BranchA we can run git merge origin/BranchA. That is, if we have:

...--F--G--H   <-- master, origin/master
            \
             I--J   <-- testBranch (HEAD)
                 \
                  K--L   <-- origin/BranchA

and run git merge origin/BranchA, Git will do the exact same fast-forward that it would have done with the version with a name BranchA pointing to commit L. What matters here are not the branch names, but rather the commits. Well, our own branch names, like testBranch, matter, in that we need to make them point where they should; but the other names—the remote-tracking names—we only use them to find the commits. They're just more readable than hash IDs, and our Git will automatically update them on git fetch.

Hence, suppose we never created BranchA in the first place. Suppose instead we did:

$ git clone <url>
$ cd <repository>
$ git checkout -b testBranch origin/BranchA
... wait until colleague updates origin/BranchA ...
$ git fetch                      # defaults to using origin
$ git merge origin/BranchA

then we'd be done, without having to fiddle with our BranchA that we never even created.

I'm going to omit what happens, here, if you make your own commits. In this case, you get a true merge—git merge will see that it's not possible to just fast-forward, and will run the process of merging, and then make a commit of type merge commit. Instead, let's just address the last bit of the puzzle, git pull.

About `git pull` (don't use it!)

My advice for git pull is that as a beginner, you should studiously avoid it. However, other people and documentation will tell you to use it, so you should at least know what it does. All that git pull is and does is to run two Git commands for you. It's meant to be convenient. The problem is, sometimes it is convenient, and sometimes it's remarkably not-convenient. It's much better, in my opinion, to learn to use the two underlying Git commands first.

The first Git command that git pull runs is just git fetch. We already saw that that does: it calls up some other Git, gets a list from it of its branch names (and tag names) and hash IDs, and brings into your repository whatever commits you need, so that your Git can update all your remote-tracking names. Then it's done: nothing has happened to your index and work-tree. It's safe to run git fetch at any time, because it just adds new commits and updates remote-tracking names.

The second command that git pull runs is where the trouble comes in. You can choose which second command it runs. Normally, that's git merge, which does what we saw above. But you can make it run git rebase, which we have not covered here.

In either case, git pull passes some extra arguments to the git merge or git rebase command. These extra arguments cause some of the inconvenience, because they are different from the arguments you might want to use. In particular, if you run:

git pull origin master

this has the effect of running:

git fetch origin master
git merge -m "merge branch 'master' of $url" origin/master

Note the slash here in the last argument—Git is going to merge the commit now identified by your origin/master. The -m (message) contains the URL taken from origin, plus the name master, rather than the name origin/master, but the effect of the merge—whether fast-forward or real merge—is the same as merging your updated remote-tracking name, origin/master.³

If you use separate git fetch and git merge commands, they make more sense. When you use git pull, the branch name you list, if you list one, is the name on the other Git, rather than the remote-tracking name in your Git.

The same holds even if you have git pull run git rebase for you. And, in the last twist of being not-convenient, the decision of whether to use merge or rebase is one you sometimes should make after running git fetch. That is, you should look at what git fetch fetches, to decide which second command to run. But if you use git pull, you must make this decision before you run git fetch, so you can't look.

Once you have used Git for a while, and are very familiar with both git merge and git rebase, then you can start using git pull safely. (But I still mostly don't.)

³There's another wrinkle here, with fairly old versions of Git: before Git version 1.8.4, git pull didn't update the remote-tracking name. Modern Git does away with this weird quirk, but some systems still use really old Git versions, so it's important to know about.

Upvotes: 2

clamentjohn

Reputation: 3987

You have made a new branch testBranch from BranchA. And your colleague pushed changes into BranchA. But you're still at testBranch. So your remote branch has no changes for you to pull and that explains why the commits in BranchA isn't seen in testBranch

...--o--o--*            <-- BranchA

You created a copy of brach BranchA.

git checkout -b testBranch

...--o--o--*            <-- BranchA, testBranch

New commits in BranchA

            A--B--C    <-- BranchA
            /
...--o--o--*            <-- testBranch