Mike
Mike

Reputation: 1097

How can I permanently git rm --cached file?

I have no problem when I do a git rm --cached filename over my file to get its changes excluded from the git staging and repository area, but the next time I do a commit I have to make the same git rm --cached filename another time to avoid the changes.

How can I permanently exclude changes with a git rm --cached file?

Thanks

Upvotes: 0

Views: 4519

Answers (1)

torek
torek

Reputation: 488213

The short answer is that you can't. All git rm --cached does is to remove an entry (or multiple entries) from your index. Something else can always put such an entry back.

That's the TL;DR and you can stop here if you want. But if you want to know what you should do with these files, read on...

This, however, is not a good way to phrase the effect of removing an entry from the index:

... when I do a git rm --cached filename over my file to get its changes excluded from the git staging and repository area

(emphasis mine), because the index doesn't store changes. There are some important terminology issues here that, if you don't get very precise, may cause you some degree of misery later. So, let's be extra-clear here, since Git is already terribly confusing (I've been working with it for more than a decade and it still has some dark corners).

About the index

So, let's talk about the index. This thing—this index—is crucial in Git because it's how you make your next commit. For this reason, some parts of Git call it the staging area. A few parts of Git, such as git rm --cached itself, call it the cache. These are just three terms for the same thing: for the data structure that Git carries around, loosely attached to your work-tree, that Git will use when it makes a new commit.

The index itself is a complicated little beast, and takes on an expanded role during merges, but what it mainly does is hold copies of files. To understand how this part works, it helps to look at commits and the work-tree (or working tree, or working directory, or any number of similar terms). A commit is, or more precisely, refers to, a complete snapshot of all files. There are no changes here, there are just files. If file main.py is in the commit, the commit has a complete copy of main.py, with the contents it had when you—or whoever—ran git commit.

The files inside commits are, intentionally, frozen and compressed—sometimes very compressed—and Git-ified into a form that only Git can really use. That's great for archiving commits forever, but no good for getting any work done. To get work done, Git has to thaw out, de-compress, and de-Git-ify your files, into the form you use normally. So that's where your work-tree comes from: it's the place where you can do your work.

You might think, then, that Git would stop here: you'd have commits, which are frozen forever, and your work-tree; and you change files in your work tree and tell Git to commit and it freezes the work-tree files again. But that's not actually what happens.

Instead, when you check out a commit—using git checkout branch, for instance—Git copies the frozen, Git-ified files from the commit, into the index. Here, the files are still Git-ified, and are ready to freeze into the next commit. But they're not actually frozen now. Now you can overwrite them. They're still Git-only though!

So, having copied the files to the index, Git now proceeds to de-compress and de-Git-ify the files, and put the normal files into your work-tree. What this means is that when you do go to make the next commit, Git can—and does—ignore your work-tree. It just takes whatever is in your index right now, the full copies of all files, and freezes them into the new commit.

This is why you must keep running git add, even if you have git add-ed some file before. What git add does is to copy the work-tree file into the index copy. If there was a copy in the index before, well, now it's overwritten by a new (and newly-Git-ified) version. If not, now it's in the index. Either way, it's now ready to go into the next commit.

The presence of a file in the index is exactly what defines a file as tracked. If it's in the index, it's tracked. If not, it's not.

So git rm --cached somefile removes the Git-ified copy of somefile from the index. Without --cached, it would also remove the ordinary-format somefile from the work-tree. With --cached, it leaves the work-tree file behind:

$ git checkout master
... bunch of files exist ...

$ git rm --cached main.py     # main.py is now untracked

$ git add main.py             # main.py is now tracked

No commits happened in between, so we changed the file from tracked to untracked, then back from untracked to tracked. The copy in the index now matches the copy that was in the index before—that earlier copy came out of a commit, but since the work-tree version is the de-Git-ified version, and we re-Git-ified it into the index, it's now the same as it was before we removed it.

If we keep the file out of the index (git rm --cached again), and then git commit. the new commit we make, now does not have the file in it. (The new commit also becomes the tip commit of the branch.) So if we git checkout this new commit later, that file isn't in the commit, so it isn't in the index. If there is an untracked file of that name in the work-tree, that untracked file is undisturbed.

But as soon as we check out any older commit that does have that file, well, it's in the commit, so Git copies it into the index, and then into the work-tree too. Now it's tracked again, because now it's in the index. If we're done with looking at the old commit, and we git checkout the new one that doesn't have the file, Git:

  1. sees that the file is in the commit and index and work-tree
  2. sees that the file isn't in the new commit to go to
  3. removes the file from the index and work-tree when switching to the new commit

About .gitignore files

Listing a file name, or a pattern that matches a file name, in a .gitignore file does not tell Git to ignore the file, exactly. What it does is to affect untracked files.

An untracked file is, as we already noted, a file that's in the work-tree, but not in the index. It doesn't matter if it is or is not in any given commit—what matters is whether it's in the index right now. If you add the file to the index, using git add to copy it (and compress and Git-ify) it from the work-tree into the index, well, now it's there and hence tracked. If you git rm --cached the file from the index, well, now it's not there and hence untracked. Swap back and forth as much as you like! Nothing really important happens yet: you're just entering the file into, or removing the file from, the index.

Being untracked has two major effects:

  • It doesn't go into the next commit, if and when you run git commit. That's because Git builds new commits from whatever is in the index.
  • And, it makes Git whine at you: Hey, file somefile is untracked, don't you want it in your commits?

Listing the file's name in a .gitignore makes Git shut up about the untracked-ness, and makes git add --all or git add . not copy that file into the index, if it's not already there. So we could say that instead of .gitignore, this file should be called .git-stop-whining-about-these-files-and-do-not-automatically-add-them-with-mass-add-operations.

That's not a pleasant name to use, so Git uses .gitignore.

There's an unfortunate side effect of listing a file in .gitignore, though. Git is normally very careful with untracked files. Suppose a file is untracked, and some operation would clobber it. For instance, suppose somefile is untracked right now, and isn't in the current commit either. But it's in the tip commit of branch dev, and you tell Git: git checkout dev, or git merge dev. (The merge case is more complicated but has the same issues.)

Since the file is in the commit at the tip of dev, Git is going to need to copy somefile from that commit, to the index, and to the work-tree. That will overwrite somefile. Git will tell you: No, I won't clobber your file somefile. It's untracked and is in the way. Please commit it, or move it out of the way.

But if you list somefile in your .gitignore, Git will feel free to clobber it after all, at least in some cases (including the git checkout dev one). So this .gitignore file should be named .git-stop-whining-about-these-files-and-do-not-automatically-add-them-with-mass-add-operations-but-do-feel-free-to-clobber-them-in-some-cases.

Takeaway: there is no perfect answer

No existing commit can ever be changed, so if some file has gotten into some commits where it should not have existed, you're pretty well stuck. You can do what some call a "history rewrite" (using, e.g., git filter-branch or The BFG). This comes up with new and improved commits, with different hash IDs, that don't have the file(s), that you must then convince everyone to use in place of the old bad commits that had the file.

Or, you can git rm --cached the file, and choose whether or not to list it in .gitignore. If it's something that is safe to clobber, listing it here is probably the way to go. That way the file won't be in your index in new commits you make from here forward. Of course, if you go back to old commits and use them—use what gets stuffed into your index—to make new commits, forgetting to git rm --cached again, that won't help, because once the file is in your index, you must actually remove it, either by switching back to a commit that doesn't have it, or by running git rm --cached on it. But git add . won't accidentally add it. You'll just have to be careful not to clobber the file out of your work-tree, when going back to older commits that do have it.

Or, of course, you can git rm --cached the file and put up with any whining.

Upvotes: 3

Related Questions