Reputation: 4090
I have the following folder structure:
Project/
.git/
.gitignore #1
a/
a1/
a2.txt
a3.txt
.gitignore #2
b/
b1.txt
c.txt
I would like to have git
not ignore a2.txt
, and not ignore entirety of b/
. Everything else should be ignored.
Based on suggestions/comments/answers provided here, the content of .gitignore #1
is:
a/
c.txt
This essentially ignores c.txt
and everything in folder a/
, the latter being subject to not being overridden by a deeper nested .gitignore
.
The content of .gitignore #2
is:
!a1/a2.txt
I was hoping this deeper nested gitignore
file would lead to not ignoring file a2.txt
.
However, running git status --ignored
results in:
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitignore
b/
Ignored files:
(use "git add -f <file>..." to include in what will be committed)
a/
c.txt
nothing added to commit but untracked files present (use "git add" to track)
That is, the entirety of a/
seems to be ignored despite the exception I was hoping would be provided by .gitignore #2
.
How can nested .gitignore
's be correctly used to achieve the requirement above?
(Note: I have only named the .gitignore
files in the description above as #1
and #2
for clarificatory purposes to differentiate between the two. In my actual computer, these files are properly named just .gitignore
.)
Upvotes: 1
Views: 3591
Reputation: 489718
The general rule here is this:
opendir
and the associated readdir
(and eventually closedir
) functions.readdir
then returns directory entries, one at a time. Each entry holds a name component as defined below. Entries may also hold additional information—in particular the directory vs file distinction—but that's as much as Git can really count on here. If the OS fills in a d_type
field with DT_DIR
, DT_FILE
, etc., Git will try to use that, otherwise Git may have to fall back to calling lstat
(which is expensive).Having read the entire directory, Git now has a set of name components. A name component is basically the part of a path-name that goes between slashes: for instance, with path/to/file.ext
we have three components, path
, to
, and file.ext
. Note that the same is true for /path/to/file.ext
: the leading slash just means "from the top" rather than "from wherever we are in the tree". Git makes some (rather peculiar) use of this same idea—that paths starting with a slash are "root relative" and the rest are "current position relative"—when using "anchored" entries in .gitignore
files (see below). So if path/to/file
exists in the top level of a working tree, Git will see only the path
part when it scans the top level directory.
(Side note: POSIX also includes scandir
, but people find this interface hard to use correctly. It's also "more efficient" in various senses on some systems, although not always or very predictably, to use the lower level readdir
routines, and Git uses readdir
.)
Now that Git has the name components, Git can check them against this particular level's .gitignore
, if it exists. It can also combine each component with any leading path name that got Git here in the first place. For the initial scan there is no such leading component, and no combining happens, but let's observe below what happens if we are allowed to proceed into path/
(which is a directory).
The components may now need a type check: file vs directory. .Real-world file systems may have additional types, including symbolic link, but for our purposes here symbolic link is to be treated like a file for the moment. We just want to know whether component represents a directory.
Now, entries in any .gitignore
file that we have read so far—including the one in this directory that we're reading now—are flagged in three independent ways:
Some are anchored, as in /path/to
or a/b
for instance, and some are not, as in *.o
for instance. An anchored entry is one containing any slashes after removing a single trailing slash if it exists.
Some are for directories only and some are for all names. An entry is flagged as directory-only if it ends in a trailing slash. (Since the trailing slash is meant as the "directory only" flag, it has to be ignored while deciding whether to set the "anchored" flag.)
Some are positive ("do ignore") entries, and some are negative ("do not ignore") entries. A negative entry is one that starts with !
as the first character. (An anchored negative entry for /path
would have to read !/path
; /!path
does not work here.)
So let's imagine that we're reading the top level, or that we're reading directory path
within the top level. Let's suppose we encounter two name components at this level: path
, and to
. We now check all of these things more or less at the same time (in order, so that "last entry" overrides):
Check the directory entry itself against all non-anchored ignore expressions. Is path
a match for any of those? If so, this name is ignored/unignored as per the positive/negative flag.
Check the full path so far against all anchored ignore expressions. For path
this is /path
, /path/path
, or /to/path
; for to
, this is one of /to
, or /path/to
, or /to/to
. (Remember that we found both /path
and /to
and presumably we're looking inside both.) If this path-so-far is a match against one of the anchored expressions, this name is ignored/unignored as per the positive/negative flag.
Note that when we do check an anchored path, we're looking at the full path in the working tree, while the .gitignore
itself might be from a sub-path within the .gitignore
tree. So if we're reading directory /path
for instance and we have /path/.gitignore
and it has an anchored entry reading /xyzzy
, we're really checking this /xyzzy
against /path/xyzzy
(because it's from /path/.gitignore
, not from /.gitignore
). This is a little complicated, but makes sense once you think about it: the anchor is relative to the .gitignore
's location. This lets you rename directories without having to edit all anchored paths in any sub-.gitignore
files.
Note further that the "is a match" test may require that the directory entry itself name a directory. This is the case if the ignore entry is flagged as directories-only. So to check for that, we need to know if the entry—path
or to
for instance—names a directory in the OS's file system.
At this point, we have done all the checks we must do on this entry. It either matched some .gitignore
entry or entries, in which case the last matching .gitignore
is the one taken, or it did not. And, subdirectory .gitignore
s are matched later in the chain, so that the deepest .gitignore
that could match this entry will always have the last match, if it has a match.
If this entry did not match any .gitignore
rule, this particular name is not ignored. If it did match a .gitignore
rule, the last one's positive/negative flag determines whether this particular name is ignored or not.
Now that we know if the name is ignored, we have two options, each of which has two sub-options:
It is ignored:
git add .
for instance), or for git status
, we don't complain about the untracked file (assuming it is in fact untracked).It is not ignored:
git add
it (for git add .
for instance) or make sure to complain if it's untracked (git status
).This determines whether git status
complains about it being untracked (for git status
commands) or whether git add
of some sort of recursive flavor (git add --all
, git add .
, git add somedir
, etc.) adds it.
Note that you can override ignore entries with git add --force
, e.g., git add --force ignored-file
adds it even if ignored-file
would be ignored by the normal .gitignore
rules. I have never tried git add --force .
to see what happens here, but it's probably not good. 😀 It might completely ignore all .gitignore
rules, which seems bad, or it might completely obey them, which also seems bad. I will leave it to the reader to try it, see what it does, and decide how bad that is.
Note also that once some pathname is in Git's index—and Git's index holds full path names, e.g., path/to/file
, as a literal string with literal slashes in it—that file is not ignored even if it's listed in a .gitignore
file. The ignoring rules are specific to the recursive directory travel process, but files listed in Git's index are tracked and are checked by en-masse git add .
operations. Once you get past the OS-interaction stuff and into Git proper, files no longer have "containing directories", they just have long path strings with embedded forward slashes if needed.
Git's index is unable to store a bare directory name,1 and that's why you cannot commit an empty directory. The scanning process will scan directories for files and will (under appropriate conditions) add those files to the index, but it won't add the containing directory. The closest Git gets to this is that a submodule entry is stored as a so-called gitlink, a "file" with mode 160000
, which if it were a Linux file-system entity would be a combination of directory-and-symbolic-link (which is not allowed in the file system). This is why the attempts to store an empty directory go awry (but you can store a submodule that has no files!).
1Technically, it can, it just can't be stored as the kind of entry that Git uses to keep track of files for the next commit. Git's index has grown a whole bunch of weird add-ons for efficiency, and that includes keeping track of untracked stuff (the so-called untracked cache), which includes untracked directories. So it can't track a directory but it can untrack one! 😀
Upvotes: 4