Reputation: 37
I searched..I read all the answers on similar or even the same questions that I could find here. I watched tutorials. It seems super easy and logical when it's happening in somebody's terminal. In mine, it's just not working. Apparently,I must be missing something super obvious. I commited .gitignore, but .DS_Store still appears in status. Then I deleted .DS_Store using git rm --cached .DS_Store (just in case, even though It wasnt in my repo). Nope. Now I'm seeing it appearing in staging area as "changes to be commited => deleted: .DS_Store". Is there a way to get rid of it completely? So it stops popping up in my status?
Here it is:
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
deleted: .DS_Store
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: README.md
modified: assets/.DS_Store
modified: assets/stylesheets/main.css
modified: index.html
Upvotes: 1
Views: 3726
Reputation: 488183
The (rather mild) problem here is that you have some number of existing commits in which .DS_Store
exists. Note that the current status indicates at least two such files, one in the top level of the tree and one in assets
:
deleted: .DS_Store modified: assets/.DS_Store
These existing commits cannot be changed. They will continue to hold a copy of the .DS_Store
file forever (or as long as these commits continue to exist, anyway).
You must delete these .DS_Store
files (all of them) from Git's index if you wish them not to be stored in future commits. By doing that—running git rm --cached
and making a commit—you're telling Git that when you check out one of these historical commits, it should extract the historical .DS_Store
files, and when you switch from that historical commit to one of the more modern commits that lacks these .DS_Store
files, Git should remove them.
Since macOS Finder will, if the file is missing, create a new .DS_Store
whenever it shows the directory in a Finder window, this particular action is safe enough in this particular case. However, there are several things to be aware of that can make this trickier for other files, should you need to use git rm --cached
on them as well.
Git's index, which has this relatively poor name (index? what is it index-ing?), has two other names. Git also calls this thing the staging area, which refers to how you use it, and the cache, which refers to how it's used internally. The name "cache" mostly shows up in the spelling of git rm --cached
, which tells Git to remove it from the index without also removing it from your working tree.
Okay, so we have three names for the thing. What does that tell us? Well, for one thing, it tells us that Git's index is really important. In fact, it's absolutely crucial. But why? Why does Git insist on shoving its index in our faces over and over again? Ultimately, the real answer to that question is just "it was a deliberate choice"—but it's worth looking at the factors that made Linus Torvalds make this choice. This turns out to start with commits.
Git is, in the end, all about commits. It's not about files, although files are, in a sense, contained within each commit. It's not about branch names either, although branch names are necessary to help you (and Git) find the commits. The commits, though, are the history in the repository. They hold the files, and they form things that we sometimes call branches (see What exactly do we mean by "branch"?).
Each commit is numbered, by a big ugly hash ID number. Each commit stores two things:
Inside the metadata for a commit, each commit holds the hash ID of some earlier commit, and this is what makes commits act as history. We won't get into the details here but this is how branches actually work.
The crucial bit to know here is that no part of any commit can ever be changed. This is because the hash ID of the commit is a checksum of all of the data in the commit. Git computes these checksums when making the commits, and verifies them when extracting the commits. If they don't match, the commit has somehow become corrupt in storage,1 and cannot be used.
So, all the files stored in a commit are read-only. They are also compressed (to save space) and de-duplicated (to save space and time). If two commits in a row share 999 out of 1000 files, they literally share the files: only the one changed file has to go into storage for the later commit. But this means that the committed files are entirely useless for getting any new work done:
So Git has to extract commits to a usable area. Git calls this area your working tree or work-tree, because it's where you do your work. The files in here are ordinary everyday files, and you can get work done.
What Git needs, then—and this is common across all version control systems; they all share this kind of setup—is:
In Git's case, the work area is literally yours, to do with as you will. This means you can create files in it that you don't want Git to save-for-all-time. This, you might assume, is where .gitignore
comes in. This assumption isn't wrong, but it's incomplete.
1Storage media do fail, in the real world. Most failures are detected, but there is a chance—usually claimed to be about 1 in 1016 or better—that some failure might be missed, giving you back bad data. Google have done analyses and found that the actual error rate tends to be higher than claimed.
What Git really needs is a list of files to commit. That is, suppose you're in your work-tree, doing your work. You create a bunch of files—maybe several hundreds, or thousands, or whatever. Two of those files should go into the next commit as new files, and the rest should be ignored.
One possible way to deal with this is to just have an "ignore these files" file, and automatically generate the list of files-to-be-committed from every file that's not listed in the "ignore these files" file. But if you try this out, you'll find it is error-prone. Git and several similar version control systems use, instead, an explicit "add some file(s)" command to add them to a list of files.
The index, then, could have been just a sort of manifest: these are the files to include; all other files are to be called out as untracked when you ask for status. Suppose this were the case, and you said add all files. You hadn't told Git explicitly to ignore the .DS_Store
files. They go into the list. You make a commit, and the commit has the .DS_Store
files. Later, you realize that you didn't intend to commit the .DS_Store
files.
It's too late. Those commits now exist. No matter what you do to the manifest, at most, you're just going to omit the .DS_Store
files from future commits. You can't fix existing commits as they are read-only. At best, you can go back to all your old commits, take them out one by one, remove the .DS_Store
files, and make a new and improved commit that's otherwise the same as the original, but now lacks the .DS_Store
files.
(You can in fact do all this. But it means that you need to get everyone else—all the other people that have a clone of your repository—to stop using the old commits in favor of the new and improved ones.)
Now, what makes Git's index particularly unusual—as compared to, e.g., Mercurial's manifest—is that having this list of files, Linus decided to expose it, and to do a couple of special tricks with it:
The index holds not only the names of all the files that go into the next commit—initially, populated by extracting whichever commit you git checkout
—but also the internal Git blob hash ID of each such file.
During merges, the index expands to hold up to three files at a time, all of which have the same name.
The git commit
command doesn't bother to look at your work-tree.2 Instead, it just packages up whatever is in the index, at the time you run git commit
. This is very fast, because those internal blob hash IDs are how Git stores the files: in fact, they're already there, pre-compressed and pre-de-duplicated.
The git add
command amounts to: compress and de-duplicate these files and put them in your index, replacing any previous file of the same name, or creating a new entry if there is no previous file.3
The git rm
command means remove the file from both your index and my work-tree. Adding --cached
means leave my work-tree copy alone.
One outcome of all of this is that git commit
won't commit what's in your work-tree. You can use this property to fuss with files in your work-tree for testing purposes, without actually committing the test code. This can be good or bad; different people have different opinions about whether it's more often good or more often bad; but that's how Linus chose to do it, and that behavior is now embedded in the hearts and minds of many Git users.
What this all boils down to is a relatively simple statement: The index holds, at all times, your proposed next commit, or else holds merge conflicts that are yet to be resolved so that a commit is not currently possible. If we omit the merge conflict case, you can just think of the index as holding your proposed next commit.
When checking out some existing commit, Git populates its index from that commit, then uses the populated index to fill in your work-tree with files. This means that Git is now ready to make a new commit, that would exactly match the current commit.4
2For usability, this long ago got changed a bit: git commit
now runs git status
internally, and produces a commented-out git status
section in the commit message you can edit.
3In fact, git add
can also mean make the index match to the extent that if you remove a work-tree file, git add
can remove the index copy of that file. For instance: rm path/to/file; git add path/to/file
is a long-winded way of running git rm path/to/file
.
4If the index and current commit do match, git commit
will typically refuse to make a new commit, forcing you to use git commit --allow-empty
to make the commit. The new commit isn't empty—it has whatever is in the index—but the difference from the current commit will be empty.
Git doesn't make a new commit from your working tree, and therefore the contents of your .gitignore
file are irrelevant to the git commit
command. Instead, Git makes a new commit from whatever is in Git's index. The contents of Git's index index are normally from the current commit.5
Once you've added some file, then, the file keeps going into new commits until you explicitly remove it. That's true regardless of the file's name being, or not being, in a .gitignore
file. To make the file really go away, you must remove it. Then the difference between the current commit and the next commit you make will include literally removing the file.
So: What does listing file names, or directory names, or patterns, or anything that you can list in .gitignore
files, in a .gitignore
do? What good is it?
5There are some exceptions to this rule; see, e.g., Checkout another branch when there are uncommitted changes on the current branch.
.gitignore
doesThere are two useful things that .gitignore
(or any other exclusion file such as .git/info/exclude
) does, and one kind of dangerous thing:
First, there are en-masse git add
operations, such as git add --all
or git add .
or git add *
.6 Or, for that matter, you could list a file pattern like *.pyc
in .gitignore
and then run git add file.pyc
anyway. What happens here is simple: If the file isn't already in Git's index, and the name is in an exclusion file, git add
doesn't add it.
This means that if a file is currently untracked—see below for the definition of untracked—it stays untracked. But if the file is already in Git's index, the .gitignore
entry has no effect.
Second, when you run git status
, Git will often whine about various files being untracked. Listing the name or pattern in an exclusion file stops the whining.
We'll get to the dangerous thing in a moment. Let's define tracked now. A tracked file, in Git, is a file that is currently in Git's index. That's it. It's really that simple. If a file in your work-tree is in Git's index right now, it is tracked. If it is not in Git's index right now, it is untracked.
Remember that Git's index contents change! If you git checkout
some commit, Git fills in its index. Those files are now tracked. If you run git add
on a new file, that file goes into Git's index. That file is now tracked. If you run git rm
—with or without --cached
—on a file, that file comes out of Git's index. That file is now untracked. Of course, if you ran git rm
without --cached
, that file is gone from your work-tree too.7
What git status
does, among other things like print out the name of your current branch, is run two git diff
commands:
git diff
compares the current commit to Git's index. For every file that is the same, Git says nothing. For every file that's different, or new or removed, Git says that the file is staged for commit, along with being modified, added, or deleted.git diff
compares what's in Git's index to what is in your work-tree. For every file that is the same, Git says nothing. For files that are different, Git says the file is not staged for commit. What's a bit unusual here is that for files that are new, Git calls these files untracked.
That last bit, of course, is because of the definition of an untracked file.Listing a file in .gitignore
makes git status
shut up about untracked-ness. It doesn't have any effect on the actual tracked-ness at all! It just shuts up the whining.
The last thing that listing a file name or pattern in .gitignore
(or some other exclusion file) does is where things are a tiny bit dangerous. This gives Git permission to destroy such a file:
a123456...
, which doesn't have a .DS_Store
file in it, and there is a .DS_Store
file in your work-tree. That is, .DS_Store
is currently untracked.git checkout
command to check out commit 4321cab...
, which does have a .DS_Store
file in it.To extract commit 4321cab...
, Git will have to put a .DS_Store
file into Git's index, and then copy that file out into your work-tree. You already have a .DS_Store
file in your work-tree. This file will be overwritten.
Normally, Git will stop and complain: Hey, if I extract commit 4321cab...
, I'll destroy your .DS_Store
file! This gives you a chance to move it out of the way, if it's precious data. But if you list the file as ignorable, Git will feel free to clobber it.
Since the data in a .DS_Store
is rarely considered precious, this is probably OK here. But be careful in general.
6The precise action of the *
in a Git command depends on whether you're using a Unix-style shell such as bash
, or a DOS-style command-interpreter such as CMD.EXE
, but Git itself does glob expansion, so it comes out pretty similarly. There are subtle differences we won't cover here though.
7Exercises with a bit of philosophical bent: if the file named ghost
isn't in your work-tree and isn't in Git's index, is the non-existent ghost
file also untracked? (It is not in Git's index, so it won't be in the next commit, anyway.) What about a file named ghost
that is in Git's index, but isn't in your work-tree? Is this file tracked? Will it be in the next commit?
Upvotes: 5