AnchovyLegend
AnchovyLegend

Reputation: 12538

Ignoring directory vs ignoring directory with wildcard

I am working on a project that requires ignoring certain directories. Is there a difference between:

path/to/mydir/*

and this:

path/to/mydir/

The way I understand it, is the upper path ignores all files within mydir including the contents of subdirectories while the non-wildstar example ignores all top level files without including the files within subdirectories.

Is this a correct way of thinking about it? When would I want to use one over the other? I'd like to resolve this confusion once and for all.

Thanks in advance!

Upvotes: 3

Views: 488

Answers (1)

torek
torek

Reputation: 489828

Is there a difference between:

path/to/mydir/*

and

path/to/mydir/

Yes. Git has some funny (as in peculiar) behaviors for entries in .gitignore files. They start to show up when we use the un-ignore directives, !path/to/mydir/file for instance.

The way I understand it, is the upper path ignores all files within mydir including the contents of subdirectories while the non-wildstar example ignores all top level files without including the files within subdirectories.

Unignore directives show that this isn't quite right. In particular, if you ignore a directory, you cannot unignore a file within that directory. However, if you ignore all files and sub-directories within a directory, you can unignore a file within the directory (but not its subdirectories, unless you explicitly un-ignore one of those too).

When would I want to use one over the other?

Use path/to/mydir/* if you want (for whatever reason) Git to open the directory path/to/mydir and read its contents, during status-checking and automatic git adding. If you don't un-ignore particular paths underneath path/to/mydir, this won't make any difference.

Long description and background

Let's start with the first peculiarity, even though it's not directly part of this question: .gitignore doesn't really mean ignore. It means something more like: Don't automatically add, and especially don't complain, about some particular set of files being untracked. However, it also means, in some cases, Don't even look inside this directory.

Git does not store directories in commits. A commit stores only files. When Git goes to extract a commit into a work-tree, it will simply create any directories needed to hold those files.

The files contained within any existing commit are fixed: unchangeable and permanent (well, as permanent as the commit anyway). Whatever is in the commit, and however those files got inside it, they are there forever (until the commit itself is garbage-collected, if ever). So they're not really interesting in terms of a .gitignore file. What's in .gitignore has no effect upon them: they're already committed.

The files that go into a new commit are determined by the contents of the index at the time you run git commit.1 At this point, the contents of a .gitignore file are again irrelevant: if something—some particular path-name and blob-ID pairing—is in the index, it goes into the commit. If something isn't in the index, it doesn't go into the commit.

This, then, gets us to the definition of an untracked file: An untracked file is a file that is not in the index. As soon as you git add a file to the index, it's tracked, and as soon as you git rm --cached a file from the index, it's untracked. Note that this means that the set of tracked files changes! Untracked is not a permanent thing: it depends on what's in the index right now.

By being "not in" the index, it will be "not in" the commit. That would make it, in effect, ignored. But Git complains about it: Hey, this file is untracked! Don't you want to add it now? Whine! Whine! Enter .gitignore: this gives Git a list of files to avoid complaining about.

But that's for files. What about directories?

Well, Git doesn't actually store directories, and it doesn't matter if you have an empty directory. Git won't whine and complain about it, and Git won't store it. Git will only store files. But all files live within directories.2 If we could just list the directory, that would make our lives easier. So we can.

If that were all there was to it, things would be less strange. But that's not quite the end of the story. Git tries to be fast, and to be fast, it's important to avoid actually looking at the files that are being stored. File system operations in general are pretty slow. So, in the index that Git uses to track files, Git keeps a bunch of secondary information about those files, that it uses to avoid looking at them if it can. As a result, Git actually only opens and reads directories if it "has to".

Git will open and read the top level directory of the work-tree to find out which files and directories it contains. For each file, if the file is in the index, all is good: the file is tracked. If the file isn't in the index, it's untracked, and Git should maybe complain. Now we check the .gitignore file to find ignore and un-ignore directives. If the file is ignored, and not subsequently un-ignored, Git will complain:

?? dir/file

(for --short output from git status).

For a directory, though, well, the directory automatically isn't in the index.3 But it could be ignored, and if it is ignored, why then, perhaps we don't have to read it at all. That would be really fast—so that's what Git does.

If the directory is ignored, Git doesn't bother opening and reading it to find more files. This means it cannot and will not find any files within that directory, and therefore will never check to see if any of those files are ignored and not later un-ignored.

If the directory itself isn't ignored, Git does open and read it. It then checks each file and sub-directory, one by one, that lives inside that directory, the same way it did for the top level (using the same code, recursively). So if you have dir/* ignored, but dir/important subsequently un-ignored, and dir/important exists, Git will discover it, and whine about it being untracked if it's a file, or read its files if it's a directory.

There's still a little bit more

That covers untracked file detection, but what about git add itself? We can run:

git add .

or:

git add '*'    # quoted to keep the shell from expanding it

or:

git add --all

and have git add do the directory-scanning and git adding. Here, the code is pretty similar, except instead of merely complaining about untracked files, Git will automatically git add (copy into the index) the updated content of any already-tracked files,4 and add for the first time any untracked-but-not-ignored files.

Again, if a directory itself is ignored, Git won't look inside it (though if there are already files in the index that are listed in an ignored directory, Git will check to see if they should be removed, as in footnote 4). Note also that if you git add xyzzy, where xyzzy is a file, and xyzzy is currently both untracked and ignored, git add will by default complain and reject the add attempt. You can use --force to override this: the file will go into the index, and thus become tracked, and now .gitignore has no effect on it.

Side note: pathname matching

Git does glob style pathname matching. The * character matches any part of a file or directory name but does not "cross a slash". Ending a glob pattern with a slash limits Git's matching so that it matches only directories. Hence:

$ cat .gitignore
*
!*/

tells Git to ignore everything (all files and directories) but then to un-ignore directories. Since Git has to be opening and reading the directory at this point, it must pass each file and directory name to the ignore-pattern-matcher, so each file will be ignored, and each directory will be ignored-but-unignored, so Git will then look inside each sub-directory.


1Since git commit takes flag and/or path name arguments that can add files to the index, this is just a tiny bit over-simplified. We can fix this by saying "at the time git commit makes the commit" instead, leaving room for git commit to modify the index a bit first.

2Technically, directories store name and file-identity pairings, rather than storing the actual files. The details vary from one file system to another.

3Modern Git actually can, optionally, cache data about directories in the index, but these special entries are not supposed to affect its observable behavior.

4Note that the existing content, or previously-added content, of these already-tracked files is already in the index. This just replaces them with the new content. If the work-tree file has been removed, Git notices this case and also removes the index version of the file, a la git rm --cached. To do this, Git has to read the index, as well as reading through the directories in question—but Git already has to read the index anyway, to see if these files are untracked.

Upvotes: 1

Related Questions