Reputation: 137
I have a master branch and am working on 2 features which involve the same files. I want to have 2 local branches pointing to the same upstream master but have different changes. I don't want to commit changes locally so that IDE preserves the formatting like the border shades in
https://d3nmt5vlzunoa1.cloudfront.net/idea/files/2018/10/k8sCompletion.png
I have been unable to use git checkout successfully because when I make changes in one and switch to other branch, the unstaged changes are visible on it too. The solution I have come up with is checking out my code in 2 repos because git worktree seems to require 2 different remote branches. However, this means hard disk inefficiency. Is there a way to achieve what I want?
I expect that when I switch between local branches, even the unstaged changes of one branch should not be visible in the other.
Upvotes: 5
Views: 4298
Reputation: 1133
I had this exact same question. Just in case anyone here shares my confusion, I didn't realise that making a commit on the feature branch solves this problem!
Once you make the commit on one of the feature branches, the staged changes are removed from the other feature branch. Say, for example you add three lines of code in Branch 1. If you checkout Branch 2, you will see these three lines. BUT, if you first git add and git commit on Branch 1, then when you checkout Branch 2 you will not see these changes.
Hope this helps somebody.
Upvotes: 0
Reputation: 488453
TL;DR: your problem is actually pretty trivial, provided your Git is at least version 2.15: just use git worktree add
correctly, creating two branches that use the same remote-tracking name as their upstream.
If not, your method of using two repositories is perhaps the best. You may still be able to use git worktree add
for versions between 2.5 and 2.15 as long as you avoid one major issue (which I'll get into below).
I expect that when I switch between local branches, even the unstaged changes of one branch should not be visible in the other.
This expectation is not supported by Git.
The real problem here is that there no such thing as "unstaged changes", and no such thing as "staged changes" either. What you see as either is an illusion, created on the fly, because the illusion tends to be more useful to a human programmer. What Git shows you as changes are computed on demand, by comparing two of three items: the current commit, the index, and the work-tree. In reality, though, there are merely files, stored in the work-tree and in the index, that are impermanent and changeable; plus commits, stored in the repository, that are permanent—well, mostly permanent—and frozen for all time. See my recent answer to Why does output differ under git diff vs. git diff --staged? for much more about this.
There are (potentially) many commits in the repository, but each repository comes with only one (1) work-tree + index pair.1 You can add more pairs of index-and-work-tree using git worktree add
, which you have tried. This should work well for your case, as long as your Git is at least version 2.15 (from Git 2.5 up through but not including Git 2.15, git worktree add
has a potentially-serious bug, depending on precisely how you use it).
1A bare repository (created with git clone --bare
or git init --bare
) has one index and no work-tree, but it seems safe to assume that you are not working with a bare repository.
... git worktree seems to require 2 different remote branches
This is not the case.
What git worktree add
does is add an index-and-work-tree pair. The added work-tree is in a separate work-tree directory (the main work-tree directory lives right next to the main repository's .git
directory; the .git
directory contains all the indices along with all of the other auxiliary information Git needs). The added work-tree comes with its own HEAD
as well, but shares all the branch names and remote-tracking names.
The constraint that git worktree add
imposes is that every work-tree must use a different branch name, or no branch name at all, for its HEAD
. To properly define how this works, we need a digression about HEAD
and branch names. I will get to this in just a moment, but first, let's
Note: there is no such thing as a remote branch. Git does have a term that it calls remote-tracking branch names. I now prefer to call these remote-tracking names as they lack one crucial property that branch names possess. A remote-tracking name typically looks like origin/master
or origin/develop
: that is, a name that starts with origin/
.2
2You can define more than one remote, or change the default name of the one remote that you might already have to something other than origin
. For instance, you could add a second remote named upstream
. In this case, you might also have upstream/master
and/or upstream/develop
. These are all valid shortened forms of remote-tracking names.
The unit of permanent storage in any Git repository is the commit. Commits are, as you have seen by now, identified by a big, ugly, apparently-random (not at all random), unique-to-each-commit hash ID like 5d826e972970a784bd7a7bdf587512510097b8c7
. These things are not useful to humans and we generally only use them via cut-and-paste, or indirectly, but the hash IDs are the real names. If you have 5d826e972970a784bd7a7bdf587512510097b8c7
(a commit in the Git repository for Git), it's always that particular commit. If you don't have it, you can get a copy of the Git repository for Git (or update your existing copy), and now you do have it, and it's that commit—and it is Git version 2.20. (The name v2.20.0
is the more human-oriented name for this commit, and is what we'd normally use. Git stores a translation table of tag names to hash IDs, which is how v2.20.0
comes to be the human-readable name for this commit.)
A commit contains a full, complete snapshot of all of the files that were in the index at the time someone instructed Git to make the commit. However, it also contains some additional metadata—data about the commit, such as who made it, when, and why (user name, email address, time stamp, and log message). In that same metadata section, Git stores the exact hash ID of the previous commit. Git calls that previous commit the parent of the commit.
In this way, every commit ever made in a repository connects back to the earlier commits in that same repository. This is the history in the repository: the string of commits, starting at the end, and working backwards. In very simple cases, such as in a pretty new repository, we might just have a few commits in a very simple line like this:
A <-B <-C
Here, the uppercase letters stand in for the actual hash IDs (which, remember, are big, ugly, and apparently-random). What we—and/or Git—will do is start at the end, at commit C
, and work backwards. Commit C
stores the actual hash ID of commit its parent B
, so that from C
we can find B
. Meanwhile B
stores the hash ID of parent A
. Since A
is the very first commit, it has no parent, and that's how Git tells that we have reached the beginning of history: there's nowhere left to go.
The trick, though, is that we need to find commit C
, whose hash ID is something apparently-random. This is where branch names come in. We pick a name like master
and use it to store the actual hash ID of C
:
A <-B <-C <--master
We mentioned earlier that commits, once made, can never change. This means we don't really need to draw all the internal arrows: we know that a commit can't remember its children, because they don't exist when we make the commit, but a commit can remember its parent, because the parent does exist at that time. Git will freeze the parent hash into the new commit forever. So if we want to add a new commit to our string of three, A-B-C
, we just do that:
A--B--C--D
In order to remember D
's hash ID, Git immediately writes the new commit's hash ID into the name master
:
A--B--C--D <-- master
So commits are fixed for all time, but branch names move, all the time!
Now, suppose we add a new branch name, develop
. A branch name, in Git, must point to exactly one commit. The one commit we'd like it to point to is probably the latest, D
:
A--B--C--D <-- develop, master
Note that both names point to the same commit. This is perfectly normal! All four commits are on both branches.
Now let's add a new commit, and call it E
:
A--B--C--D
\
E
Which of the two branch names should we have Git update? This is where HEAD
comes in.
Before we make E
, we tell Git which name to attach HEAD
to. We do this with git checkout
. If we git checkout master
, Git will attach HEAD
to the name master
. If we git checkout develop
, Git will attach HEAD
to the name develop
. Let's do the latter before we make E
, so that we start with:
A--B--C--D <-- develop (HEAD), master
Now we'll make E
, and Git will update the name to which HEAD
is attached, i.e., develop
:
A--B--C--D <-- master
\
E <-- develop (HEAD)
This is, in short, how branches grow. Git creates a new commit whose parent is the current commit, as found via the name HEAD
, which is attached to some branch name. After creating the new commit—which gives it a new, unique, big ugly hash ID—Git writes the new hash ID of the new commit into that same branch name, so that the branch name now points to the new commit. The new commit continues to point back to the old commit.
For reasons that will make sense in a moment, git worktree add
requires that a newly added work-tree use a different branch name for that work-tree's HEAD
. That is, when we draw the commits and branch names and attach HEAD
to some branch name, we're really attaching this work-tree's HEAD
, as there is now more than one HEAD
.
So now that we have two names, master
and develop
, we can make two different work-trees, using these two different branch names:
A--B--C--D <-- master (HEAD) # in work-tree M
\
E <-- develop
vs:
A--B--C--D <-- master
\
E <-- develop (HEAD) # in work-tree D
The contents of a work-tree, and its index, will in general start out matching those of its HEAD
commit. We'll modify some files in the work-tree, git add
them to that work-tree's index, and git commit
there, and update that work-tree's HEAD
. That's why these two need to use different branch names. Watch what happens as we work inside work-tree M (for master). We start out with:
A--B--C--D <-- master (HEAD) # in work-tree M
\
E <-- develop
The index and work-tree match commit D
. We do some work, git add
, and git commit
to make a new commit. The new commit's hash ID is new and unique; let's call it F
here, and draw it in, updating the name master
:
A--B--C--D--F <-- master (HEAD) # in work-tree M
\
E <-- develop
Now let's navigate over to the other work-tree (D for develop, but that sounds a lot like commit D
, so let's just stop naming it like this). This has its own HEAD
so the picture is:
A--B--C--D--F <-- master
\
E <-- develop (HEAD)
Note that master
has changed—the branch names are shared between all the work-trees—and new commit F
has appeared, as the commits are also shared. But develop
still points to commit E
, and our index and work-tree here, in this work-tree, match those of E
. Now we modify some files, git add
to copy them back into the index, and git commit
to make a new commit that we can call G
:
A--B--C--D--F <-- master
\
E--G <-- develop (HEAD)
Commit G
will appear in the other work-tree, and the other work-tree's develop
will identify commit G
, but since the other work-tree has master
/ commit F
checked-out, the other work-tree's index and work-tree will still match commit F
.
When you create a new branch name, using either git checkout -b
or git branch
, you control:
origin/whatever
is typical, but it can be any name—is stored in that setting.It's very normal for your master
to use origin/master
as its upstream name, and for your develop
to use origin/develop
as its upstream name, but there are no constraints here at all. You can have all your branches share origin/master
as their upstream, for instance. Or, you can have branches that have no upstream set. See Why do I have to "git push --set-upstream origin <branch>"? for a discussion of upstream settings.
There is a magic default:
$ git checkout feature-xyz
will try to check out your existing feature-xyz
branch. If there isn't a feature-xyz
branch, your Git will check all your remote-tracking names, to see if there is, for instance, an origin/feature-xyz
. If so, your Git will create your own feature-xyz
, pointing to the same commit as origin/feature-xyz
, and with origin/feature-xyz
set as its upstream. This is intended as a convenience. If it's inconvenient, don't use it: use -b
instead.
The git worktree add
command shares this particular trick with git checkout
: Both have a -b
that creates a new branch (without doing this), and both default to trying to check out some existing branch. So both will automatically create a new branch with an upstream set, for this particular case.
In Git, a detached HEAD just means that HEAD
is not attached to a branch name. Remember that the usual way to draw what's going on is to attach HEAD
to some name:
...--F--G--H <-- master (HEAD)
We can, instead, make Git point HEAD
directly to a commit, without going through a branch name:
...--F--G <-- HEAD
\
H <-- master
When in this mode, if we make a new commit, Git writes the new commit's hash ID into HEAD
itself, rather than the name that HEAD
is not attached-to:
...--F--G--I <-- HEAD
\
H <-- master
An added work-tree can always be in detached HEAD mode, but there's a terrible bug in Git versions 2.5, where git worktree
was first introduced, that's not fixed until Git version 2.15.
Specifically, each added work-tree has its own HEAD
and its own private index file. This is necessary due to the way the rest of Git works: HEAD
records information about this work-tree, and the index is the index for this work-tree, so they are all one big group item. Unfortunately, Git's garbage collector, git gc
, was not properly taught to respect added work-trees.
The garbage collector's job is to find unreferenced (unused / unneeded) Git objects—blobs, trees, commits, and annotated tags that look like leftover junk in a repository. Git uses this so that Git commands can, whenever they want, create these various internal objects without worrying about whether they're really necessary, and without having to take any special action to handle being interrupted (by, e.g., CTRL+C, or a network session disconnection). Other normal everyday Git actions, including git rebase
, can generate this kind of junk. That's all perfectly fine and normal, because the janitor, git gc
, cleans it up regularly.
But any new commits you make with a detached HEAD have only HEAD
itself referring to them. In the main work-tree, this is not a problem: the gc
janitor checks the HEAD
file, sees the reference, and knows not to delete these commits. But git gc
doesn't check the added, extra HEADs. So if you have an added work-tree with a detached HEAD, that detached HEAD's objects may vanish. Similar rules apply to blob objects, and if a blob object stored in an added work-tree's index is referenced only from that index, git gc
may remove the underlying blob object.
There's a secondary protection: git gc
by default will not prune any object younger than 14 days old. This gives all Git commands 14 days to get their work done, before the janitor comes by and throws their in-progress objects into the rubbish bin behind the office. So this all works fine in the main work-tree, and works fine in added work-trees in Git 2.15 and later. But for intermediate Git versions, git gc
may come by, see a 14-or-more day old commit, tree, or blob that shouldn't be thrown out because of an added work-tree, and yet not realize that and throw it out.
This bug does not strike if you don't have a detached HEAD and are careful to add-and-commit within 14 days. It also does not strike if you disable garbage collection, but that's not generally a great idea: Git depends on gc
to clean up and maintain good performance. And, of course, it was fixed in Git 2.15, so if you have that or later, you're fine. It only affects added work-trees, so use caution between 2.5 and 2.15.
Upvotes: 5