MilesT
MilesT

Reputation: 141

Which local repository contains my code "origin/develop" or "develop"

Using git-flow and VSTS, when I "clone" from the browser to "develop" does that update my local: "origin/develop" or "develop" branch? I appear to be having inconsistent results...

Upvotes: 0

Views: 379

Answers (2)

torek
torek

Reputation: 489638

For one short version, see Daniel Mann's answer. I'm going to guess what your real question is:

why, sometimes, does git merge (or git pull) leave me "one ahead", and sometimes not?

The TL;DR is: because git merge (or the git merge that git pull runs) can do one of two different things.

Background (long)

There are several things to realize when working with Git, that are different from working with centralized version control systems. The number one most important thing is this: There is more than one repository. But there isn't (normally anyway) more than one local repository. One of the repositories is yours and is local, and one or more are someone else's, and not local. So:

Which local repository contains my code ...

The literal answer to this question is your (single) local repository. This isn't the right question!

You might want instead to ask which (local) branch is yours, but that's the wrong question again: In Git, all branch names are local. The trick here is that the name origin/develop isn't a branch name. It is, instead, a remote-tracking name (or what Git calls a remote-tracking branch name—I think this slightly less confusing if we just drop the word branch, which through overuse, has already become meaningless by this point).

To try to keep this short, I'll skip over a lot of details and imprecision here:

  • Git isn't really about branches after all. Git is all about commits. That's the unit of storage you'll work with.

  • Each commit has a unique hash ID. In effect, this is the "true name" of the commit. These hash IDs are globally unique, across every Git repository. (This is why they have to be so big and ugly, so that no Git anywhere ever accidentally re-uses the same hash ID as any other Git anywhere.)

  • Gits share commits with each other by these hash IDs. Any Git repository can tell if it has a commit yet, because the hash ID is globally unique. Either you do have that hash ID, in which case you do have the commit, or you don't, so you don't.

  • Each commit stores a snapshot of all of your files, as its data, plus some metadata, such as who made the commit, when, and so on. One of the metadata items is for Git itself and it is a list of hash IDs of previous commits. Usually there is just one entry here: the previous or parent commit for this ordinary, non-merge commit.

So given some chain of commits, the last commit is the only one we can't find by using a later commit and working backwards:

... <-F <-G <-H

If H is the last commit, we can use it to find G, which we can use to find F, and so on. (The letters here stand in for the actual hash IDs, which are random-looking and too hard for humans to work with.)

These backwards-looking chains of commits are, or can be, called branches. But branch names like master and develop can also be called branches. Remote-tracking names can be called branches as well. See What exactly do we mean by "branch"?

A branch name like master or develop simply holds the hash ID of the last commit in the chain. To add a new commit, your Git will write out the new commit, pointing backwards to the previously-last commit, and then write the new commit's hash ID into your branch name.

Your branch names are all yours. Your develop is solely yours. Your Git will update it as you add new commits. Your Git can update your branch name with git merge, too, and other commands.

The other Git repository you're using has a URL. That URL is stored under a name, which Git calls a remote. The standard first remote name is always origin, so origin is the remote. The git clone command sets this up for you automatically, which is why origin is the standard first remote.

That other Git has its own branch names. (There's no need for you and them to use the same names, but you probably would like to use the same names just for your own sanity.) They'll have commits added to their develop, from time to time.

Your Git will call up their Git, using the URL stored under the name origin. Their Git will then hand over any new commits they have that you don't, and now you'll have them. Your Git will now update your origin/develop to remember the hash ID they had stored under their branch name develop. Your origin/develop is a remote-tracking name, not a branch name.

There's one other tricky bit here. When you run your initial git clone, your Git:

  • creates a new, empty repository (no commits, no branches, nothing);
  • sets up the name origin to hold the URL;
  • uses git fetch to get all their commits and branch names, renaming all their branch name to your remote-tracking names: this is where your initial origin/master and origin/develop come from; and
  • last, runs git checkout. This creates new branch, because at the moment you have no branches at all.

Let's take a look at what your repository might look like, in part, right after a git clone <url>. Let's suppose that their Git recommends that your git clone do a git checkout master as its last step. You might now have this

...--o--E   <-- master (HEAD), origin/master
         \
          F--G--H   <-- origin/develop

Your Git just created your master (and attached HEAD to it), to point to commit E, some commit you got in the initial fetch. Their master, in their repository, names commit E, so your new master also names commit E.

Your Git renamed all their branch names to make your origin/* remote-tracking names. So your origin/master points to commit E, and your origin/develop points to commit H.

(By the way, commits up through E are now on both branches. This is something that is peculiar to Git: most version control systems say that a commit is on one branch, but in Git, any commit can be on many branches. A commit is found by starting with a branch name, going to its last commit, and then working backwards. Commit E is the last commit of master, but is also found by going to H—the last commit of develop—and then working backwards three steps, so E is on both branches. All commits before E are on both branches, too.)

If you now run git checkout develop, your Git creates a new develop name. Your Git tries to check out the existing develop, sees that there isn't one, and would error-out—but wait, there's origin/develop! Your Git now bets that what you wanted was to create develop using the same commit identified by origin/develop. So your Git does that, and then attaches your HEAD to that name:

...--o--E   <-- master, origin/master
         \
          F--G--H   <-- develop (HEAD), origin/develop

Your Git will extract all the files from the snapshot for commit H into a workspace, which Git calls your work-tree. (I'm leaving out all the stuff about Git's index, which is really important.)

You can now make new commits, if you like, in the usual way. (This involves Git's index, which is one reason it's so important.) Let's say you do—that you make one new commit, which we'll call I.

The parent of new ordinary commit I will be commit H: I will point back to H. Git will write I's actual hash ID into the name develop, since HEAD is attached to develop. The result looks like this:

...--o--E   <-- master, origin/master
         \
          F--G--H   <-- origin/develop
                 \
                  I   <-- develop (HEAD)

If you add a new commit that we'll call J, Git keeps adding as usual:

...--o--E   <-- master, origin/master
         \
          F--G--H   <-- origin/develop
                 \
                  I--J   <-- develop (HEAD)

At this point, you might wonder if, or notice that, someone has done some work and sent new commits to the other Git over at origin. You don't have them yet, but you can now have your Git call up their Git and get them:

git fetch origin

Your Git calls up their Git (using the URL saved in the name origin), asks them about their branch names and last commits, finds that they have their develop pointing to a new commit with a hash ID your Git has never heard of, and asks them for their new commits. They say I have some commit ______ (some big ugly hash ID): we'll call that L. They say The parent of L is _____ (some big ugly hash ID): we'll call that K. They say The parent of K is H, and hey, we already have commit H. So your Git has their Git send over commits K and L (only), which your Git puts in your repository.

Then, your Git updates your origin/develop to remember that their develop points to commit L. So you now have:

...--o--E   <-- master, origin/master
         \
          F--G--H--K--L   <-- origin/develop
                 \
                  I--J   <-- develop (HEAD)

in your repository. Note that your own develop is not affected. A git fetch operation only updates your remote-tracking names. It does not affect any of your branch names, because your branch names are yours, not theirs.

Handling divergence with git merge

Now that you have this:

          I--J   <-- develop (HEAD)
         /
...--G--H
         \
          K--L   <-- origin/develop

situation, you have to figure out what to do about this divergence. You need to combine your work—your two commits I-J—with theirs, their K-L. You can use git merge or git rebase to do this, and these are the two most common things to want to do.

If you want to run git merge, you can do that now:

git merge origin/develop

This combines work and, if it can, makes a new merge commit on its own:

          I--J
         /    \
...--G--H      M   <-- develop (HEAD)
         \    /
          K--L   <-- origin/develop

The new commit goes on your branch, just like any other new commit. It has a snapshot, just like any other new commit. (The snapshot is the one Git made by figuring out what you did to H, and what they did to H, and combining those and doing that to the snapshot from H.) It has an author and committer metadata, just like any other commit. The author and committer, and the date-and-time-stamps, are you: you made this merge commit. It has a log message, just like any ordinary commit. There's a default, low-quality message here; you can put in a better one, but in practice nobody seems to bother:

merge branch 'origin/develop' into develop

The only thing that is special about this new merge commit M is that it has not one but two parents. One parent is your previous commit J. That's the first parent too, which is important later (but not really right now). The second parent is the commit your origin/develop names, i.e., commit L.

Merge doesn't always have to merge

Sometimes you don't have a divergence. For instance, suppose you didn't make any new commits—that you started with:

...--o--E   <-- master, origin/master
         \
          F--G--H   <-- develop (HEAD), origin/develop

and stuck with that. Then your ran git fetch and they'd made two new commits, which we'll call I-J as before:

...--o--E   <-- master, origin/master
         \
          F--G--H   <-- develop (HEAD)
                 \
                  I--J   <-- origin/develop

You can now run git merge origin/develop as before. Your Git will notice that the base for this merge, commit H, is also the tip of your develop. That is, if we did a real merge, we'd combine what we did, H-vs-H (nothing), with what they did, H-vs-J. The result would obviously match J: do nothing, then do something, results in whatever the "do something" did.

Your Git will, by default, not do anything to merge "nothing" with "something". It will, instead, just check out commit J directly and slide your name develop forward. Git calls this a fast-forward merge, but there's no actual merging. The result looks like this:

...--o--E   <-- master, origin/master
         \
          F--G--H
                 \
                  I--J   <-- develop (HEAD), origin/develop

which we can draw more simply as:

...--o--E   <-- master, origin/master
         \
          F--G--H--I--J   <-- develop (HEAD), origin/develop

We're back to having your develop and your origin/develop point to the same commit, in this case, commit J.

The bottom line about merging

The end result here is this: git merge can do a fast-forward, which isn't really a merge, and doesn't make a new commit. Or, it can do a true merge, which is really a merge, and does make a new commit.

Git can only do a fast-forward merge under two conditions:

  • First, your current commit must match the merge base that git merge finds on its own. That's the case when you haven't made any commits yourself.

  • Second, you must tell git merge that this is allowed or required. But "allowed" is the default. Using git merge --ff-only tells Git that it's required, and using git merge --no-ff tells Git that it's forbidden. The default is: allowed; do it if possible.

If a fast-forward is possible and allowed-or-required, that's what Git does. Otherwise Git needs to due a real merge. If that's forbidden—if a fast-forward is required instead—the git merge command errors out and does nothing. Only when a real merge is both allowed and necessary, or is forced by --no-ff, do we get the a real merge.

Notes on git pull

The git pull command just does runs two Git commands for you, as a sort of convenience wrapper:

  1. First, it runs git fetch. This works the way git fetch always works.
  2. Second, it runs git merge (or if you tell it to, git rebase instead).

Assuming you use the default git merge, this second Git command takes the fetched updates and merges them, either fast-forward or real. If it does a fast-forward, it doesn't make any new commit. If it goes for a real merge, and the merge succeeds, it does make a new commit.

There's one special change here, vs just doing git fetch && git merge yourself. The low-quality default merge message for a git merge you do yourself is:

merge branch 'origin/develop' into develop

The low-quality default merge message that git pull supplies is:

merge branch 'develop' of <url> into develop

where the url part comes from the URL saved under the name origin.

Git's merge doesn't always succeed on its own

A git merge that has to do a real merge can have merge conflicts. This answer is already too long, so I won't go into any detail here, but the merge conflicts are represented by an unusual state in Git's index. Your job at this point becomes to resolve the conflicts, and put the correct resolutions into Git's index. This is another reason knowing a lot about Git's index is important.

The final merge, conflicted or not, will have a snapshot of all of your files. If Git makes the merge on its own, Git made the snapshot. If there are merge conflicts, you control the final snapshot yourself.

Upvotes: 0

Daniel Mann
Daniel Mann

Reputation: 59045

origin/develop is the remote tracking branch. It represents the state of the develop branch on the upstream repository.

When you fetch, it goes into origin/develop. When you pull, it fetches into origin/develop, then merges origin/develop into develop. You should be working in develop.

Upvotes: 1

Related Questions