Proper way to setup multiple .gitignore files in nested folders of a repository

Question

I have the following folder structure:

Project/
    .git/
    .gitignore #1
    a/
        a1/
             a2.txt
             a3.txt
        .gitignore #2
    b/
        b1.txt
    c.txt

I would like to have git not ignore a2.txt, and not ignore entirety of b/. Everything else should be ignored.

Based on suggestions/comments/answers provided here, the content of .gitignore #1 is:

a/
c.txt

This essentially ignores c.txt and everything in folder a/, the latter being subject to not being overridden by a deeper nested .gitignore.

The content of .gitignore #2 is:

!a1/a2.txt

I was hoping this deeper nested gitignore file would lead to not ignoring file a2.txt.

However, running git status --ignored results in:

On branch master

No commits yet

Untracked files:
  (use "git add ..." to include in what will be committed)
        .gitignore
        b/

Ignored files:
  (use "git add -f ..." to include in what will be committed)
        a/
        c.txt

nothing added to commit but untracked files present (use "git add" to track)

That is, the entirety of a/ seems to be ignored despite the exception I was hoping would be provided by .gitignore #2.

How can nested .gitignore's be correctly used to achieve the requirement above?

(Note: I have only named the .gitignore files in the description above as #1 and #2 for clarificatory purposes to differentiate between the two. In my actual computer, these files are properly named just .gitignore.)

torek · Accepted Answer

The general rule here is this:

Git will use the OS's facilities to read directories.
To scan a directory, Git calls opendir and the associated readdir (and eventually closedir) functions.
readdir then returns directory entries, one at a time. Each entry holds a name component as defined below. Entries may also hold additional information—in particular the directory vs file distinction—but that's as much as Git can really count on here. If the OS fills in a d_type field with DT_DIR, DT_FILE, etc., Git will try to use that, otherwise Git may have to fall back to calling lstat (which is expensive).

Having read the entire directory, Git now has a set of name components. A name component is basically the part of a path-name that goes between slashes: for instance, with path/to/file.ext we have three components, path, to, and file.ext. Note that the same is true for /path/to/file.ext: the leading slash just means "from the top" rather than "from wherever we are in the tree". Git makes some (rather peculiar) use of this same idea—that paths starting with a slash are "root relative" and the rest are "current position relative"—when using "anchored" entries in .gitignore files (see below). So if path/to/file exists in the top level of a working tree, Git will see only the path part when it scans the top level directory.

(Side note: POSIX also includes scandir, but people find this interface hard to use correctly. It's also "more efficient" in various senses on some systems, although not always or very predictably, to use the lower level readdir routines, and Git uses readdir.)

Now that Git has the name components, Git can check them against this particular level's .gitignore, if it exists. It can also combine each component with any leading path name that got Git here in the first place. For the initial scan there is no such leading component, and no combining happens, but let's observe below what happens if we are allowed to proceed into path/ (which is a directory).

The components may now need a type check: file vs directory. .Real-world file systems may have additional types, including symbolic link, but for our purposes here symbolic link is to be treated like a file for the moment. We just want to know whether component represents a directory.

Now, entries in any .gitignore file that we have read so far—including the one in this directory that we're reading now—are flagged in three independent ways:

Some are anchored, as in /path/to or a/b for instance, and some are not, as in *.o for instance. An anchored entry is one containing any slashes after removing a single trailing slash if it exists.
Some are for directories only and some are for all names. An entry is flagged as directory-only if it ends in a trailing slash. (Since the trailing slash is meant as the "directory only" flag, it has to be ignored while deciding whether to set the "anchored" flag.)
Some are positive ("do ignore") entries, and some are negative ("do not ignore") entries. A negative entry is one that starts with ! as the first character. (An anchored negative entry for /path would have to read !/path; /!path does not work here.)

So let's imagine that we're reading the top level, or that we're reading directory path within the top level. Let's suppose we encounter two name components at this level: path, and to. We now check all of these things more or less at the same time (in order, so that "last entry" overrides):

Check the directory entry itself against all non-anchored ignore expressions. Is path a match for any of those? If so, this name is ignored/unignored as per the positive/negative flag.
Check the full path so far against all anchored ignore expressions. For path this is /path, /path/path, or /to/path; for to, this is one of /to, or /path/to, or /to/to. (Remember that we found both /path and /to and presumably we're looking inside both.) If this path-so-far is a match against one of the anchored expressions, this name is ignored/unignored as per the positive/negative flag.

Note that when we do check an anchored path, we're looking at the full path in the working tree, while the .gitignore itself might be from a sub-path within the .gitignore tree. So if we're reading directory /path for instance and we have /path/.gitignore and it has an anchored entry reading /xyzzy, we're really checking this /xyzzy against /path/xyzzy (because it's from /path/.gitignore, not from /.gitignore). This is a little complicated, but makes sense once you think about it: the anchor is relative to the .gitignore's location. This lets you rename directories without having to edit all anchored paths in any sub-.gitignore files.

Note further that the "is a match" test may require that the directory entry itself name a directory. This is the case if the ignore entry is flagged as directories-only. So to check for that, we need to know if the entry—path or to for instance—names a directory in the OS's file system.

At this point, we have done all the checks we must do on this entry. It either matched some .gitignore entry or entries, in which case the last matching .gitignore is the one taken, or it did not. And, subdirectory .gitignores are matched later in the chain, so that the deepest .gitignore that could match this entry will always have the last match, if it has a match.

If this entry did not match any .gitignore rule, this particular name is not ignored. If it did match a .gitignore rule, the last one's positive/negative flag determines whether this particular name is ignored or not.

Now that we know if the name is ignored, we have two options, each of which has two sub-options:

It is ignored:
- If it's a directory, we simply don't scan it at all.
- If it's a file, we don't auto-add the file (with git add . for instance), or for git status, we don't complain about the untracked file (assuming it is in fact untracked).
It is not ignored:
- If it's a directory, we scan it recursively and apply all these rules.
- If it's a file, we git add it (for git add . for instance) or make sure to complain if it's untracked (git status).

This determines whether git status complains about it being untracked (for git status commands) or whether git add of some sort of recursive flavor (git add --all, git add ., git add somedir, etc.) adds it.

Note that you can override ignore entries with git add --force, e.g., git add --force ignored-file adds it even if ignored-file would be ignored by the normal .gitignore rules. I have never tried git add --force . to see what happens here, but it's probably not good. 😀 It might completely ignore all .gitignore rules, which seems bad, or it might completely obey them, which also seems bad. I will leave it to the reader to try it, see what it does, and decide how bad that is.

Note also that once some pathname is in Git's index—and Git's index holds full path names, e.g., path/to/file, as a literal string with literal slashes in it—that file is not ignored even if it's listed in a .gitignore file. The ignoring rules are specific to the recursive directory travel process, but files listed in Git's index are tracked and are checked by en-masse git add . operations. Once you get past the OS-interaction stuff and into Git proper, files no longer have "containing directories", they just have long path strings with embedded forward slashes if needed.

Git's index is unable to store a bare directory name,¹ and that's why you cannot commit an empty directory. The scanning process will scan directories for files and will (under appropriate conditions) add those files to the index, but it won't add the containing directory. The closest Git gets to this is that a submodule entry is stored as a so-called gitlink, a "file" with mode 160000, which if it were a Linux file-system entity would be a combination of directory-and-symbolic-link (which is not allowed in the file system). This is why the attempts to store an empty directory go awry (but you can store a submodule that has no files!).

¹Technically, it can, it just can't be stored as the kind of entry that Git uses to keep track of files for the next commit. Git's index has grown a whole bunch of weird add-ons for efficiency, and that includes keeping track of untracked stuff (the so-called untracked cache), which includes untracked directories. So it can't track a directory but it can untrack one! 😀

Proper way to setup multiple .gitignore files in nested folders of a repository

Answers (1)

Related Questions