Reputation: 1730

Git adding file to specific branch only

How to stop some file from some other branch to get merged in master branch? I have a file that I have added in a particular branch. I would like to merge this branch to master but I want master branch to ignore the new file getting merged with it.

How to inform Git that this file related with this branch only?

Upvotes: 2

Answers (1)

torek

Reputation: 490168

There's no such thing in Git.

Branches are not permanent, and are rather unimportant in a lot of ways. I'm afraid this answer is going to be kind of long, because there is a lot of background you need to know here, and lots of Git introductions are not good at conveying it.

Background: Commits and branch names

Commits are important: commits are permanent and unchangeable.¹ Each commit holds a snapshot of some set of files; the set of files in that commit, along with their contents, are as permanent and unchangeable as the commit itself.

The "true name" of any one commit is its big ugly hash ID. Git makes up a new, unique, big ugly hash ID for each new, unique commit you add to a repository. Once added, that commit is permanent and unchangeable (though see footnote 1 again). The commit holds all the files that you had Git tracking at the time you said git commit, in exactly the form they had at that time (we'll get back to this in a moment).

¹Commits can be removed, under various conditions that essentially boil down to no one can find them any more. This is trickier than it looks and it's best to just start out as thinking of them as the thing that Git saves permanently, that holds your file-snapshots. While a commit can fade away once you can't find it, it literally can't change, because its true name—its hash ID—is a cryptographic checksum of the commit itself.

Commits also remember their parent commit. That is, each commit stores the big ugly hash ID of the commit you had extracted earlier with git checkout. It also stores your name, as the author and committer of that commit, along with your email address and a date/time stamp; and it stores your log message, telling everyone why you thought it was good to make that particular commit.

Suppose we start with a repository that has just three commits. Instead of big ugly hash IDs, let's give them single uppercase letters. The first commit is A, the second is B, and so on. We'll stop being able to make new commits once we have 26 commits, but that will be enough for us here.

Since commit A is the first commit, it has no parent. If we made B from A, though, B remembers A as being B's parent, and if we made C from B, C remembers B. Since commits can't change once made, A cannot remember B, and B cannot remember C: this "I remember" thing, this arrow pointing from a later commit back to an earlier one, must go backwards:

A  <-B  <-C

What if we made C using A instead of using B? Then C will point back, not to B, but to A:

A--B
 \
  C

but for the moment let's stick with a straight line of commits:

A--B--C

(where we get too lazy to bother drawing the arrow-heads, but really that's partly because we will have problems with the angled ones). Now, remembering that these handy uppercase letters stand in for actual big ugly hash IDs, ask yourself: how will we—or Git—remember the actual hash ID of commit C? C remembers the ID of B, which remembers the ID of A, so we don't need to remember those two. But we do need to remember the last commit in the chain.

This is where branch names enter the picture. A branch name like master simply serves to hold one commit hash ID: the end, or tip commit, of a chain. From there, Git will work out all the earlier commits. So let's draw this:

A--B--C   <-- master

Now, let's make a new commit. We'll call it D but really it has some unique big ugly hash ID. We make it by running:

git checkout master
... edit some files ...
git add ... some files ...
git commit

When we make D, we have commit C checked out. So C will be the parent of new commit D. Git will collect your log message, make a snapshot of all the files—not just the git add-ed ones; we'll get back to this in a bit—using your name and the current time for the author and committer data, and make new, permanent, unchangeable commit D that remembers C as its parent:

A--B--C--D

But now the name master has to change. It was remembering C; now it must remember D instead. So the last step for git commit is to write D's actual hash ID into the name master, giving us this:

A--B--C--D   <-- master

Let's say we now use git checkout to go back to commit B, and make—or, easier, already have—a branch name that points to commit B, so that we can draw our graph like this:

     C--D   <-- master
    /
A--B   <-- branch

Now we make another new commit E. Git will write out the new commit as usual, making the snapshot and setting E's parent to B—because that's the one we have checked out. When it's done, branch will point to new commit E:

     C--D   <-- master
    /
A--B
    \
     E   <-- branch

How did Git know to move the name branch and not the name master? This is where HEAD comes in. When you ran:

git checkout branch

Git attached your / its HEAD to that branch name, so we should really be writing this down as:

     C--D   <-- master
    /
A--B
    \
     E   <-- branch (HEAD)

You might wonder, now, which branch commits A and B are on. We can see from the drawing that C and D are on master—and that Git knows this because Git starts from the end of master, commit D, and works backwards to reach C. We can see that E is the tip of branch. But which branch is B on?

Git's answer is that these two commits are on both branches. The set of branches that contain or reach any given commit is determined by starting at all the branch names (way over on the right, or whichever end of the graph is the "latest" end) and working backwards. If we can reach a commit by traveling backwards like this, that commit is on that branch.

Generally speaking, branches tend to grow new commits. As we saw above, adding a commit to branch had no effect on master. We can make more new commits:

     C--D   <-- master
    /
A--B
    \
     E--F--G--H   <-- branch (HEAD)

and master still points to D. But if we git checkout master (to attach HEAD to master), and then use git merge to merge commit H into master, we get this:

     C--D---------I   <-- master (HEAD)
    /            /
A--B            /
    \          /
     E--F--G--H   <-- branch

and now branch master reaches back from I through both D and H, so that all the commits are on master again.

I went pretty fast here, so at this point, you might take some time to work through the web site Think Like (a) Git, which has a lot more on how this graph stuff works.

This is why there's no such thing as a branch-specific file

Saved files—snapshots—inside Git are stored inside commits. The commit is a snapshot of all the files that go with that commit. Any commit that does not have the file, does not have the file permanently. Any commit that does have the file, does have the file permanently. But the set of branches that contain that commit depends on what you do with branch names.

You can have files that are specific to particular commits, but those commits may suddenly be on many branches, especially after someone uses git merge.

The index and work-tree, or, where do the files in the snapshot come from?

Once you have used Git for a little while, you will have a feel for how commits store frozen versions of files, and using git checkout <some old commit>, you can get those versions back. We also mentioned above that Git saves a snapshot of every file—or more precisely, all the tracked files.

Since each commit (snapshot) saves every file, Git needs a way to make these not take up too much space. Git does this in part by compressing the files. Without going into a lot of technical detail, Git can compress the heck out of some files. These super-compressed files are in a special, Git-only format, as permanent and unchangeable as the commits themselves. But this means no other program can deal with these files. You need to have the files out—via git checkout—in a form where you can see them, edit them, and use them with other programs.

So, Git provides you with an area it calls the work-tree or working tree. Files here are in their ordinary form, rather than some special Git-only format. That's pretty straightforward too.

Other version control systems stop here: they have the frozen committed files, and the work-tree files. When you run hg commit or svn commit or whatever, they spend hours analyzing what's in your work-tree to figure out how to commit this. OK, it's not actually hours, it just feels like hours. Anyway, it's slow. But when you run git commit, it's lightning-quick. Git gets its tremendous speed here from what Git calls, variously, the index, the staging area, or the cache, depending on which bit of Git documentation or command is doing the calling.

The trick here is that when you git checkout a commit, Git does not just expand all the frozen files into a normal de-compressed form in your work-tree. Git first copies the frozen, Git-formatted files into this index, keeping them in the special Git-only format but un-freezing them. Only then does it bother with de-compressing them into the work-tree. This index therefore tracks—there's that word again—the set of files in your work-tree.

When you run git add file, what Git does is compress the work-tree copy of file into the special Git-only format, then write that into the index. If there was some other version of the file in the index, well, now it's updated to the new one from the work-tree. If the file wasn't in the index before, well, now it is. Either way, the file is now ready to go.

Hence, when you run git commit, all Git has to do is freeze the pre-compressed files that are already in the index. That's how it is that Git is so fast: it's just re-using all those already-compressed files. If you have not run git add, you get the same version of the file that was in the commit you had checked-out earlier.

Note that when Git does this, it's all set up for the next commit too. Suppose, for instance, that you are in this state:

...--F--G   <-- somebranch (HEAD)

because you ran git checkout somebranch. The index and work-tree are now full of all the files from commit G. You modify some work-tree file, run git add to copy the modified file back into the index, then run git commit. Git makes new commit H from the index, changes the name somebranch to point to new commit H, and you have:

...--F--G--H   <-- somebranch (HEAD)

with commit H matching the index!

This also explains why if you forget to git add a file, and commit, the commit gets the old version of the file. The commit gets whatever is in the index, which is the old version from the previous commit.

This is also how Git defines a tracked vs an untracked file: a file in the work-tree is tracked if and only if it is in the index right now. Using git add will copy it into the index, after which it will be tracked, because it is in the index and will be in the next commit you will make. Using git rm will remove a file from the index and the work-tree: now it's just gone, so it's neither tracked nor untracked. Using git rm --cached will remove it from the index, but leave it in the work-tree: now it's untracked, because it's in the work-tree, but not in the index.

A good short description for the index, then, is this: The index is the set of files that will go into the next commit you make. Using git commit simply freezes the ready-to-go index. Using git add, you update or insert files into the index, making them ready-to-go. You do your work in your work-tree, but you commit from your index.

About `git status` and `.gitignore`

The git status command prints out a few interesting facts, like "on branch master", first—it knows which branch you're on by where the name HEAD is attached. Then it tells you about files staged for commit and/or files not staged for commit.

This is really telling you what's in your index. Your index starts out matching your commit—because you made it from your last commit, for instance, or because of git checkout. But if you ran git add, you wrote new files, or new versions of existing files, into the index. Git can do a very fast comparison of the HEAD commit—from the hash ID that the branch name is remembering—to the index, to tell you if any of the files in the index are different from their HEAD version. These are the files that are staged for commit.

Let's look at that again, because it's important:

If the index file matches the HEAD file, Git says nothing.
If the index file is different from the HEAD file, Git says it's staged for commit.

That way, if you have a big project with thousands of files, Git can just call your attention to the ones that are going to be different if you make a commit right now.

The second half of this is about what you could still git add. If you've changed (or created or removed) some files in your work-tree, but have not made the index match, you could run git add to update the index to match the work-tree. So the second comparison is index-vs-work-tree, and:

If the index file matches the work-tree file, Git says nothing.
If the index file is different from the work-tree file, Git says that the file is not staged for commit.

But there's one case that occurs here that does not occur for the first comparison, and that's when there's a work-tree file that is not in the index at all. Such a file is untracked, and Git will whine about it.

Well, that is, Git will whine about it unless you tell Git: Don't whine about this particular untracked file. That's a lot of what .gitignore is about: shutting up Git. Listing a file in .gitignore won't make Git not have the file in the index, because if you check out a commit where the file is in the commit, that file goes into the index.

Listing a file in .gitignore will tell Git that a mass "add", such as git add . or git add *, should not add the file if it's not already in the index. But once it's in the index for some reason, the file is then tracked, and .gitignore entries no longer apply. They only apply to untracked (not-in-index) files.

Conclusion

You can't get what you want. It's a good idea to avoid committing host- or site-specific files such as configurations, at least in the repository for general software. Instead, commit an example, or a set of configuration defaults, and then have the VCS ignore the actual configuration if it goes in the same directory. Keep the actual configuration in a separate repository if you want to version the configuration.

Upvotes: 9