Reputation: 115017
I've got a large repo in which I'm downloading a large number of vsix files (visual studio and azure devops extension), then extract these and run a number of tools over them.
I'd like to keep some of the files in the extracted folder in my git repo, but ignore most of the rest (it's 30GB).
My folder structure looks like this:
\vsixs
\publisher
\extension
\1.2.3
\results-code.json
\extension.vsixmanifest
\extension.vsomanifest
\taskname
\task.json
\node_modules
\index.js
\...
\...
\...
somescripts.ps1
.gitignore
vsixs/
results-code.json
task.json
(can live at deeper levels)extension.*manifest
(can live at deeper levels)I've tried a number of things in my .gitignore
vsixs/
!vsixs/**
!result-code.json
!extension.*manifest
!task.json
And a number of other permutations... and read a number of other answers, but am yet to stumble on anything that works.
I think, due to performance optimizations that the format I should specify should be oddly specific, but I can't figure out how specific...
Upvotes: 1
Views: 229
Reputation: 489708
The magic you want is:
!*/
Remember that it's somewhat expensive; use it wisely. (Combine it with !/*
to un-ignore everything in the root of your working tree.)
As always, the issue here is a mismatch between Git's storage format (as seen in commits), and your OS's storage format. Git handles only files—never folders—but the file names include forward slashes, e.g., somescripts.ps1
and vsixs/publisher/extension/1.2.3/results-code.json
. Your OS, meanwhile, insists that there's no such file name as vsixs/publisher/extension/1.2.3/results-code.json
. Instead, there's a folder named vsixs
; inside that folder is a sub-folder named publisher
; and so on. Eventually we get to a file named results-code.json
(or some other name).
Git must bridge this gap for you:
Git does so during git checkout
or git switch
quite easily, by creating folders on demand: if Git needs to check out a file named vsixs/publisher/extension/1.2.3/results-code.json
, it goes ahead and creates vsixs
, then vsixs\publisher
, and so on, as needed, until your OS is willing to create file named results-code.json
in the appropriate folder.
Git does so during git commit
by having Git's index contain a file named vsixs/publisher/extension/1.2.3/results-code.json
. This files goes into Git's index (aka "staging area") at the time you check out the commit, which is the same time Git created all the folders (if they did not already exist).
Git does so during git add
with ... well, how? This is where the problem occurs. It's git add
that updates the copies of files that exist in Git's index, which is fine for files that already exist there because they were copied out of some earlier commit. But for new files, it's not so fine.
So: If a file named vsixs/publisher/extension/1.2.3/results-code.json
is already in Git's index, git add -u
or some other en-masse git add
will generally find and update it, since Git already knows to look for it. But if it's not in Git's index, Git has to search for it, and this searching is very slow on many systems. This kind of searching literally requires that Git open every folder everywhere and read the folder's content:
vsixs
has, as its content, sub-folders and/or files;publisher
inside vsixs
has, as its content, more sub-folders and/or files;For each folder, Git can laboriously open it (e.g., open vsixs
), read all its entries, get both the short name (extension
) and the constructed full name (vsixs/publisher
—note the forward slash here), and if that's a file, git add
if that's appropriate, and if that's a folder, open and read it, recursively, to find more files and folders, and so on.
To speed this process up, if you give Git permission to "ignore" a folder during this scanning process, Git does so. So if Git is allowed to "ignore" vsixs
, it simply does not open and read it and therefore never discovers publisher
, much less anything in publisher
.
If Git must look inside some folder in your working tree, Git cannot ignore that folder. Git can ignore some or all of the files in that folder, but it must open and read the folder itself. So don't give Git permission to skip that folder.
If you know the precise name of any particular folder, you can list that as a "do not ignore" exception:
!/vsixs/
for instance, in the top level .gitignore
in your working tree, says that Git must open and read vsixs
. (Should there be a vsxis/vsixs/
, that's a different name, because /vsixs
requires that the name end there, so this does not force Git to un-ignore that sub-vsixs
.) Or:
!vsixs/
works similarly, except that it "un-ignores" any folder named vsixs
. There's something odd going on here with the two slashes, which we'll get to in a moment. For now, just remember that this only works on entries named vsixs
. If we want all folders, we need *
, or rather, */
:
!*/
As always, the leading !
means that this is a specific do not* ignore* entry.
What's the difference between a
, /a
, a/
, a/b
, a/b/
, /a/b
, and so on? There are two obvious differences and one less-obvious one:
For .gitignore
rules, the starts with and contains in the middle both have the same effect: they both "anchor" this ignore rule. The ends with has a different, separate meaning: it means only match a folder.
An un-anchored ignore rule—one that doesn't start with slash, nor (after removing any trailing slash that means "folder") contain a slash—has Git look only at the short name component. That is, if Git is reading vsixs
and comes across publisher
, Git uses the short part, publisher
, as the name for the un-anchored tests. But it uses the longer, constructed, vsixs/publisher
for the anchored tests. (Again, Git always uses a forward slash at this point.)
We'll mostly gloss over the trick Git uses for nested .gitignore
files, which is that these constructed names start at the same level as the .gitignore
file that was the source of the rule. For a top-level .gitignore
file, this doesn't skip anything, so it's just the full path.
As noted in the parenthetical remark above, the trailing slash means match if and only the actual thing found, by reading the OS's folder in the working tree, is itself a folder. So we'll make use of that to force Git to read folders without also forcing Git to un-ignore files.
If we know the name of a folder we want searched, we list it:
!vsixs/
for example. This allows Git to ignore other folders.
If we don't know the name of the folder, we have to use a pattern. If we know nothing at all about the name, we must use:
!*/
which tells Git: if it's a folder, open and read it.
Since these are un-anchored—they neither start with, nor contain in the middle, a slash—they apply to all folders here and at any deeper level. That makes this last one expensive. If you can be sure that they should only apply at this level, you can put the .gitignore
file here and use:
!/*/
to reduce its cost. Since vsixs
is a known name, if you know its level, you can write:
!path/to/vsixs/
or:
!/vsixs/
and it's now an anchored rule and applies only to the one specific vsixs
, which in theory makes it even cheaper (though if there's hardly anything named vsixs
, it's pretty cheap to start with, so this might be negligible anyway).
Hence, to ignore all files, then un-ignore specific files and all directories, the rule would read:
*
!*/
!.gitignore
!file-to-keep
These un-ignore rules could go in any order since Git applies all rules to each entry it finds as it reads through folders. The last matching rule wins. So as Git scans through the top level, it finds:
.gitignore
. This does match *
, doesn't match */
, does match .gitignore
, and doesn't match file-to-keep
. The matched entry is negated (has a leading !
) so .gitignore
is not "ignored": it's considered for git add
-ing if untracked, and not complained-about as an tracked file.
vsixs
. This does match *
and then */
, but does not match either other rule. The last matching rule says do not ignore so Git opens and reads it recursively.
Note that you can put a .gitignore
file down as far as needed, provided Git opens and reads that folder. The one folder Git is guaranteed to open-and-read is the top level of the working tree. After that point, the ignore rules kick in.
Upvotes: 2