Swarnim Raj
Swarnim Raj

Reputation: 137

How can I have 2 independent local Git branches off same remote branch

I have a master branch and am working on 2 features which involve the same files. I want to have 2 local branches pointing to the same upstream master but have different changes. I don't want to commit changes locally so that IDE preserves the formatting like the border shades in
https://d3nmt5vlzunoa1.cloudfront.net/idea/files/2018/10/k8sCompletion.png

I have been unable to use git checkout successfully because when I make changes in one and switch to other branch, the unstaged changes are visible on it too. The solution I have come up with is checking out my code in 2 repos because git worktree seems to require 2 different remote branches. However, this means hard disk inefficiency. Is there a way to achieve what I want?

I expect that when I switch between local branches, even the unstaged changes of one branch should not be visible in the other.

Upvotes: 5

Views: 4298

Answers (2)

Andrew Einhorn
Andrew Einhorn

Reputation: 1133

I had this exact same question. Just in case anyone here shares my confusion, I didn't realise that making a commit on the feature branch solves this problem!

Once you make the commit on one of the feature branches, the staged changes are removed from the other feature branch. Say, for example you add three lines of code in Branch 1. If you checkout Branch 2, you will see these three lines. BUT, if you first git add and git commit on Branch 1, then when you checkout Branch 2 you will not see these changes.

Hope this helps somebody.

Upvotes: 0

torek
torek

Reputation: 488453

TL;DR: your problem is actually pretty trivial, provided your Git is at least version 2.15: just use git worktree add correctly, creating two branches that use the same remote-tracking name as their upstream.

If not, your method of using two repositories is perhaps the best. You may still be able to use git worktree add for versions between 2.5 and 2.15 as long as you avoid one major issue (which I'll get into below).

Long

I expect that when I switch between local branches, even the unstaged changes of one branch should not be visible in the other.

This expectation is not supported by Git.

The real problem here is that there no such thing as "unstaged changes", and no such thing as "staged changes" either. What you see as either is an illusion, created on the fly, because the illusion tends to be more useful to a human programmer. What Git shows you as changes are computed on demand, by comparing two of three items: the current commit, the index, and the work-tree. In reality, though, there are merely files, stored in the work-tree and in the index, that are impermanent and changeable; plus commits, stored in the repository, that are permanent—well, mostly permanent—and frozen for all time. See my recent answer to Why does output differ under git diff vs. git diff --staged? for much more about this.

There are (potentially) many commits in the repository, but each repository comes with only one (1) work-tree + index pair.1 You can add more pairs of index-and-work-tree using git worktree add, which you have tried. This should work well for your case, as long as your Git is at least version 2.15 (from Git 2.5 up through but not including Git 2.15, git worktree add has a potentially-serious bug, depending on precisely how you use it).


1A bare repository (created with git clone --bare or git init --bare) has one index and no work-tree, but it seems safe to assume that you are not working with a bare repository.


... git worktree seems to require 2 different remote branches

This is not the case.

What git worktree add does is add an index-and-work-tree pair. The added work-tree is in a separate work-tree directory (the main work-tree directory lives right next to the main repository's .git directory; the .git directory contains all the indices along with all of the other auxiliary information Git needs). The added work-tree comes with its own HEAD as well, but shares all the branch names and remote-tracking names.

The constraint that git worktree add imposes is that every work-tree must use a different branch name, or no branch name at all, for its HEAD. To properly define how this works, we need a digression about HEAD and branch names. I will get to this in just a moment, but first, let's

Note: there is no such thing as a remote branch. Git does have a term that it calls remote-tracking branch names. I now prefer to call these remote-tracking names as they lack one crucial property that branch names possess. A remote-tracking name typically looks like origin/master or origin/develop: that is, a name that starts with origin/.2


2You can define more than one remote, or change the default name of the one remote that you might already have to something other than origin. For instance, you could add a second remote named upstream. In this case, you might also have upstream/master and/or upstream/develop. These are all valid shortened forms of remote-tracking names.


Commits, branch names, and HEAD

The unit of permanent storage in any Git repository is the commit. Commits are, as you have seen by now, identified by a big, ugly, apparently-random (not at all random), unique-to-each-commit hash ID like 5d826e972970a784bd7a7bdf587512510097b8c7. These things are not useful to humans and we generally only use them via cut-and-paste, or indirectly, but the hash IDs are the real names. If you have 5d826e972970a784bd7a7bdf587512510097b8c7 (a commit in the Git repository for Git), it's always that particular commit. If you don't have it, you can get a copy of the Git repository for Git (or update your existing copy), and now you do have it, and it's that commit—and it is Git version 2.20. (The name v2.20.0 is the more human-oriented name for this commit, and is what we'd normally use. Git stores a translation table of tag names to hash IDs, which is how v2.20.0 comes to be the human-readable name for this commit.)

A commit contains a full, complete snapshot of all of the files that were in the index at the time someone instructed Git to make the commit. However, it also contains some additional metadata—data about the commit, such as who made it, when, and why (user name, email address, time stamp, and log message). In that same metadata section, Git stores the exact hash ID of the previous commit. Git calls that previous commit the parent of the commit.

In this way, every commit ever made in a repository connects back to the earlier commits in that same repository. This is the history in the repository: the string of commits, starting at the end, and working backwards. In very simple cases, such as in a pretty new repository, we might just have a few commits in a very simple line like this:

A <-B <-C

Here, the uppercase letters stand in for the actual hash IDs (which, remember, are big, ugly, and apparently-random). What we—and/or Git—will do is start at the end, at commit C, and work backwards. Commit C stores the actual hash ID of commit its parent B, so that from C we can find B. Meanwhile B stores the hash ID of parent A. Since A is the very first commit, it has no parent, and that's how Git tells that we have reached the beginning of history: there's nowhere left to go.

The trick, though, is that we need to find commit C, whose hash ID is something apparently-random. This is where branch names come in. We pick a name like master and use it to store the actual hash ID of C:

A <-B <-C   <--master

We mentioned earlier that commits, once made, can never change. This means we don't really need to draw all the internal arrows: we know that a commit can't remember its children, because they don't exist when we make the commit, but a commit can remember its parent, because the parent does exist at that time. Git will freeze the parent hash into the new commit forever. So if we want to add a new commit to our string of three, A-B-C, we just do that:

A--B--C--D

In order to remember D's hash ID, Git immediately writes the new commit's hash ID into the name master:

A--B--C--D   <-- master

So commits are fixed for all time, but branch names move, all the time!

Now, suppose we add a new branch name, develop. A branch name, in Git, must point to exactly one commit. The one commit we'd like it to point to is probably the latest, D:

A--B--C--D   <-- develop, master

Note that both names point to the same commit. This is perfectly normal! All four commits are on both branches.

Now let's add a new commit, and call it E:

A--B--C--D
          \
           E

Which of the two branch names should we have Git update? This is where HEAD comes in.

Before we make E, we tell Git which name to attach HEAD to. We do this with git checkout. If we git checkout master, Git will attach HEAD to the name master. If we git checkout develop, Git will attach HEAD to the name develop. Let's do the latter before we make E, so that we start with:

A--B--C--D   <-- develop (HEAD), master

Now we'll make E, and Git will update the name to which HEAD is attached, i.e., develop:

A--B--C--D   <-- master
          \
           E   <-- develop (HEAD)

This is, in short, how branches grow. Git creates a new commit whose parent is the current commit, as found via the name HEAD, which is attached to some branch name. After creating the new commit—which gives it a new, unique, big ugly hash ID—Git writes the new hash ID of the new commit into that same branch name, so that the branch name now points to the new commit. The new commit continues to point back to the old commit.

Added work-trees require that you attach their HEADs to different branches

For reasons that will make sense in a moment, git worktree add requires that a newly added work-tree use a different branch name for that work-tree's HEAD. That is, when we draw the commits and branch names and attach HEAD to some branch name, we're really attaching this work-tree's HEAD, as there is now more than one HEAD.

So now that we have two names, master and develop, we can make two different work-trees, using these two different branch names:

A--B--C--D   <-- master (HEAD)    # in work-tree M
          \
           E   <-- develop

vs:

A--B--C--D   <-- master
          \
           E   <-- develop (HEAD)  # in work-tree D

The contents of a work-tree, and its index, will in general start out matching those of its HEAD commit. We'll modify some files in the work-tree, git add them to that work-tree's index, and git commit there, and update that work-tree's HEAD. That's why these two need to use different branch names. Watch what happens as we work inside work-tree M (for master). We start out with:

A--B--C--D   <-- master (HEAD)    # in work-tree M
          \
           E   <-- develop

The index and work-tree match commit D. We do some work, git add, and git commit to make a new commit. The new commit's hash ID is new and unique; let's call it F here, and draw it in, updating the name master:

A--B--C--D--F   <-- master (HEAD)    # in work-tree M
          \
           E   <-- develop

Now let's navigate over to the other work-tree (D for develop, but that sounds a lot like commit D, so let's just stop naming it like this). This has its own HEAD so the picture is:

A--B--C--D--F   <-- master
          \
           E   <-- develop (HEAD)

Note that master has changed—the branch names are shared between all the work-trees—and new commit F has appeared, as the commits are also shared. But develop still points to commit E, and our index and work-tree here, in this work-tree, match those of E. Now we modify some files, git add to copy them back into the index, and git commit to make a new commit that we can call G:

A--B--C--D--F   <-- master
          \
           E--G   <-- develop (HEAD)

Commit G will appear in the other work-tree, and the other work-tree's develop will identify commit G, but since the other work-tree has master / commit F checked-out, the other work-tree's index and work-tree will still match commit F.

The upstream setting of any branch name is something you control

When you create a new branch name, using either git checkout -b or git branch, you control:

  • whether that new branch has any upstream setting, and
  • if so, what name—origin/whatever is typical, but it can be any name—is stored in that setting.

It's very normal for your master to use origin/master as its upstream name, and for your develop to use origin/develop as its upstream name, but there are no constraints here at all. You can have all your branches share origin/master as their upstream, for instance. Or, you can have branches that have no upstream set. See Why do I have to "git push --set-upstream origin <branch>"? for a discussion of upstream settings.

There is a magic default:

$ git checkout feature-xyz

will try to check out your existing feature-xyz branch. If there isn't a feature-xyz branch, your Git will check all your remote-tracking names, to see if there is, for instance, an origin/feature-xyz. If so, your Git will create your own feature-xyz, pointing to the same commit as origin/feature-xyz, and with origin/feature-xyz set as its upstream. This is intended as a convenience. If it's inconvenient, don't use it: use -b instead.

The git worktree add command shares this particular trick with git checkout: Both have a -b that creates a new branch (without doing this), and both default to trying to check out some existing branch. So both will automatically create a new branch with an upstream set, for this particular case.

Detached HEADs and added indices, and the bug in Git 2.5 through (but not including) 2.15

In Git, a detached HEAD just means that HEAD is not attached to a branch name. Remember that the usual way to draw what's going on is to attach HEAD to some name:

...--F--G--H   <-- master (HEAD)

We can, instead, make Git point HEAD directly to a commit, without going through a branch name:

...--F--G   <-- HEAD
         \
          H   <-- master

When in this mode, if we make a new commit, Git writes the new commit's hash ID into HEAD itself, rather than the name that HEAD is not attached-to:

...--F--G--I   <-- HEAD
         \
          H   <-- master

An added work-tree can always be in detached HEAD mode, but there's a terrible bug in Git versions 2.5, where git worktree was first introduced, that's not fixed until Git version 2.15.

Specifically, each added work-tree has its own HEAD and its own private index file. This is necessary due to the way the rest of Git works: HEAD records information about this work-tree, and the index is the index for this work-tree, so they are all one big group item. Unfortunately, Git's garbage collector, git gc, was not properly taught to respect added work-trees.

The garbage collector's job is to find unreferenced (unused / unneeded) Git objects—blobs, trees, commits, and annotated tags that look like leftover junk in a repository. Git uses this so that Git commands can, whenever they want, create these various internal objects without worrying about whether they're really necessary, and without having to take any special action to handle being interrupted (by, e.g., CTRL+C, or a network session disconnection). Other normal everyday Git actions, including git rebase, can generate this kind of junk. That's all perfectly fine and normal, because the janitor, git gc, cleans it up regularly.

But any new commits you make with a detached HEAD have only HEAD itself referring to them. In the main work-tree, this is not a problem: the gc janitor checks the HEAD file, sees the reference, and knows not to delete these commits. But git gc doesn't check the added, extra HEADs. So if you have an added work-tree with a detached HEAD, that detached HEAD's objects may vanish. Similar rules apply to blob objects, and if a blob object stored in an added work-tree's index is referenced only from that index, git gc may remove the underlying blob object.

There's a secondary protection: git gc by default will not prune any object younger than 14 days old. This gives all Git commands 14 days to get their work done, before the janitor comes by and throws their in-progress objects into the rubbish bin behind the office. So this all works fine in the main work-tree, and works fine in added work-trees in Git 2.15 and later. But for intermediate Git versions, git gc may come by, see a 14-or-more day old commit, tree, or blob that shouldn't be thrown out because of an added work-tree, and yet not realize that and throw it out.

This bug does not strike if you don't have a detached HEAD and are careful to add-and-commit within 14 days. It also does not strike if you disable garbage collection, but that's not generally a great idea: Git depends on gc to clean up and maintain good performance. And, of course, it was fixed in Git 2.15, so if you have that or later, you're fine. It only affects added work-trees, so use caution between 2.5 and 2.15.

Upvotes: 5

Related Questions