Cyker
Cyker

Reputation: 10994

Match dirs using `**/` in gitattributes

I was expecting **/ to match any directory in the git repo, but in fact it matches nothing. man gitattributes says:

patterns that match a directory do not recursively match paths inside that directory (so using the trailing-slash path/ syntax is pointless in an attributes file; use path/** instead)

But in my case **/ even didn't match the directory itself. I mean, if I have a/b.txt in my repo, then **/ doesn't even match dir a. I was expecting it to match dir a but not file a/b.txt.

If I change **/ to **, then both the directory and its contents are matched. In the same example, both dir a and file a/b.txt are matched.

So how do gitattributes work on directories? If git thinks attributes on directories don't matter, why does git list them in the case of **?

Upvotes: 10

Views: 7082

Answers (1)

torek
torek

Reputation: 490078

Unlike .gitignore, which does special-case directory handling, it appears that—with one important exception—the .gitattributes code never special-cases a directory. This is in part because Git doesn't really store directories.1 So each attribute is applied to some set of files. Despite this, files do reside in directories—in both real OSes, and in terms of how they are stored in commits (see footnote 1)—so it might make sense for .gitattributes to have a special case for trailing slashes, and not apply the attribute to a file.

As you've observed, though, this just doesn't happen. This is despite some code in attr.c (in path_matches) that checks for directories. But it doesn't really matter anyway, because the useful parts of Git-attributes (again with one exception) only apply to files anyway.

What I mean by the emphasized text above is this. Consider, e.g.:

*.txt    text
*.jpg    -text diff=jpg
version  export-subst

which might be found in some .gitattributes file. This tells Git that the *.txt files are text and therefore any CRLF conversions would apply, while *.jpg files are binary and they won't (and diffs will be generated with the "jpg" diff driver: see gitattributes). The version file will have a hash ID substitution performed when doing git archive. All of these operations occur on files. If we had:

sub/    -text

this simply does nothing at all, because the directory sub will never be in Git's index (so operations that affect index/worktree transitions are not applicable) and will merely be stored as a tree within the Git repository (see footnote one again) with no operations or options available to you, the user. If you want files within sub to be treated as binary, you can write:

sub/*    -text
sub/**/* -text

This, admittedly, is two lines, where one would serve if Git allowed sub/ as a pattern here. So being able to name files by directory prefixes might be somewhat useful, but the lack of this ability is not a serious problem.

The exception, and future considerations

The one exception here is the export-ignore attribute. The git archive command, which builds a tar or zip archive from a specific commit in some repository, obeys a .gitattributes file in the snapshot being archived, and if that file says not to export some file or directory, it will not export that file, or any files contained within the directory.

Because it does seem sensible for attributes to do directory matching, it's possible that some future Git version will do it, so that you can write, e.g., assets/ -text. This doesn't gain much over assets/* -text though, and currently you do have to use something like the latter.


1When Git makes a commit, Git creates one tree object for the top level directory of files and sub-directories, plus one more tree object for each sub-directory that has files. It does so based on the contents of Git's index, which does not store the directories themselves. The index stores only the files.

Technically, what's in the index is, for each file, a tuple of information: <mode, hash-ID, stage-number, name>. You can see this information by running git ls-files --stage. This is augmented with cache data and flags—you can see most of that as well by adding --debug to the git ls-files command—but the four-tuple, with the stage number usually being zero, is the crucial part of the index. Here is a sample from an index for the Git repository for Git itself:

100644 c2f5fe385af1bbc161f6c010bdcf0048ab6671ed 0       .cirrus.yml
100644 c592dda681fecfaa6bf64fb3f539eafaf4123ed8 0       .clang-format
100644 f9d819623d832113014dd5d5366e8ee44ac9666a 0       .editorconfig
100644 b08a1416d86012134f823fe51443f498f4911909 0       .gitattributes
100644 e7b4e2f3c204c2c94c60222abbc702bd7d72de39 0       .github/CONTRIBUTING.md
100644 952c7c3a2aa11ea1087390be61eab6f7c0013599 0       .github/PULL_REQUEST_TEMPLATE.md
100644 84a5dcff7a05fb724d78826212c5fa22ba5df958 0       .github/workflows/main.yml
100644 ee509a2ad263989fcebe3c3543aa32efed1cacda 0       .gitignore
100644 cbeebdab7a5e2c6afec338c3534930f569c90f63 0       .gitmodules
100644 bde7aba756ea74c3af562874ab5c81a829e43c83 0       .mailmap

As long as all files are at stage zero, Git can turn the index into a tree object. The tree object contains a subtree above those entries that appear to have a directory-name within them, as for example the .github/* files. Inside the index, directories don't exist: the files are just named .github/CONTRIBUTING.md and so on. Within a commit, though, they become subtrees. Note that one file is named github/workflows/main.yml, so the .github sub-tree will contain a sub-sub-tree named workflows that will contain the main.yml file.

In a sense, Git stores the files in a directory-tree-like fashion, then; but it uses the files, at the time you do work in your repository, without regard to directories, because the index flattens the directories away. This may have been a bit of a design mistake, as it is the primary reason Git can't store an empty directory properly. (Various attempts have been made to work around this, and the only one that works correctly—using an empty submodule—is pretty klunky.)

Upvotes: 11

Related Questions