Reputation: 10994
I was expecting **/
to match any directory in the git repo, but in fact it matches nothing. man gitattributes
says:
patterns that match a directory do not recursively match paths inside that directory (so using the trailing-slash path/ syntax is pointless in an attributes file; use path/** instead)
But in my case **/
even didn't match the directory itself. I mean, if I have a/b.txt
in my repo, then **/
doesn't even match dir a
. I was expecting it to match dir a
but not file a/b.txt
.
If I change **/
to **
, then both the directory and its contents are matched. In the same example, both dir a
and file a/b.txt
are matched.
So how do gitattributes work on directories? If git thinks attributes on directories don't matter, why does git list them in the case of **
?
Upvotes: 10
Views: 7082
Reputation: 490078
Unlike .gitignore
, which does special-case directory handling, it appears that—with one important exception—the .gitattributes
code never special-cases a directory. This is in part because Git doesn't really store directories.1 So each attribute is applied to some set of files. Despite this, files do reside in directories—in both real OSes, and in terms of how they are stored in commits (see footnote 1)—so it might make sense for .gitattributes
to have a special case for trailing slashes, and not apply the attribute to a file.
As you've observed, though, this just doesn't happen. This is despite some code in attr.c
(in path_matches
) that checks for directories. But it doesn't really matter anyway, because the useful parts of Git-attributes (again with one exception) only apply to files anyway.
What I mean by the emphasized text above is this. Consider, e.g.:
*.txt text
*.jpg -text diff=jpg
version export-subst
which might be found in some .gitattributes
file. This tells Git that the *.txt
files are text and therefore any CRLF conversions would apply, while *.jpg
files are binary and they won't (and diffs will be generated with the "jpg" diff driver: see gitattributes). The version
file will have a hash ID substitution performed when doing git archive
. All of these operations occur on files. If we had:
sub/ -text
this simply does nothing at all, because the directory sub
will never be in Git's index (so operations that affect index/worktree transitions are not applicable) and will merely be stored as a tree within the Git repository (see footnote one again) with no operations or options available to you, the user. If you want files within sub
to be treated as binary, you can write:
sub/* -text
sub/**/* -text
This, admittedly, is two lines, where one would serve if Git allowed sub/
as a pattern here. So being able to name files by directory prefixes might be somewhat useful, but the lack of this ability is not a serious problem.
The one exception here is the export-ignore
attribute. The git archive
command, which builds a tar or zip archive from a specific commit in some repository, obeys a .gitattributes
file in the snapshot being archived, and if that file says not to export some file or directory, it will not export that file, or any files contained within the directory.
Because it does seem sensible for attributes to do directory matching, it's possible that some future Git version will do it, so that you can write, e.g., assets/ -text
. This doesn't gain much over assets/* -text
though, and currently you do have to use something like the latter.
1When Git makes a commit, Git creates one tree object for the top level directory of files and sub-directories, plus one more tree object for each sub-directory that has files. It does so based on the contents of Git's index, which does not store the directories themselves. The index stores only the files.
Technically, what's in the index is, for each file, a tuple of information: <mode, hash-ID, stage-number, name>. You can see this information by running git ls-files --stage
. This is augmented with cache data and flags—you can see most of that as well by adding --debug
to the git ls-files
command—but the four-tuple, with the stage number usually being zero, is the crucial part of the index. Here is a sample from an index for the Git repository for Git itself:
100644 c2f5fe385af1bbc161f6c010bdcf0048ab6671ed 0 .cirrus.yml
100644 c592dda681fecfaa6bf64fb3f539eafaf4123ed8 0 .clang-format
100644 f9d819623d832113014dd5d5366e8ee44ac9666a 0 .editorconfig
100644 b08a1416d86012134f823fe51443f498f4911909 0 .gitattributes
100644 e7b4e2f3c204c2c94c60222abbc702bd7d72de39 0 .github/CONTRIBUTING.md
100644 952c7c3a2aa11ea1087390be61eab6f7c0013599 0 .github/PULL_REQUEST_TEMPLATE.md
100644 84a5dcff7a05fb724d78826212c5fa22ba5df958 0 .github/workflows/main.yml
100644 ee509a2ad263989fcebe3c3543aa32efed1cacda 0 .gitignore
100644 cbeebdab7a5e2c6afec338c3534930f569c90f63 0 .gitmodules
100644 bde7aba756ea74c3af562874ab5c81a829e43c83 0 .mailmap
As long as all files are at stage zero, Git can turn the index into a tree object. The tree object contains a subtree above those entries that appear to have a directory-name within them, as for example the .github/*
files. Inside the index, directories don't exist: the files are just named .github/CONTRIBUTING.md
and so on. Within a commit, though, they become subtrees. Note that one file is named github/workflows/main.yml
, so the .github
sub-tree will contain a sub-sub-tree named workflows
that will contain the main.yml
file.
In a sense, Git stores the files in a directory-tree-like fashion, then; but it uses the files, at the time you do work in your repository, without regard to directories, because the index flattens the directories away. This may have been a bit of a design mistake, as it is the primary reason Git can't store an empty directory properly. (Various attempts have been made to work around this, and the only one that works correctly—using an empty submodule—is pretty klunky.)
Upvotes: 11