Reputation:
I've recently been reading up a bit on .gitattributes and also found places like this one, https://github.com/alexkaratarakis/gitattributes, where they try to maintain gitattributes for all file types. However in my mind, looking through those files, I instinctively think this is an unmaintainable mess. It means you'd have to update that file any time you use any new file extension, or any software brings out a new file extension, which is just impossible. When you're working with a team of 30+ people it's just a nightmare to maintain some file like that, we can barely maintain a simple icons.svg file.
But along with that I have been coding and using git for many years, on many different projects, and I've never used .gitattributes. We use things like prettier on our project which rewrites newlines to "lf" and we have devs on windows and things like this never gives any issues, vscode also never gives any issues with things like this. Git also automatically picks up binary files like pngs and automatically shows text differences for files like svg, I've never had to configure that.
So I ask the question, is it really necessary to have this file? Because it seems to me like it's signing up for a ton of maintenance that's completely unnecessary and that git is smart enough to figure out what it should or shouldn't do with a file.
Upvotes: 34
Views: 28685
Reputation: 1326676
is it really necessary to have this file?
Yes, for any setting (eol, diff, merge filters, content filters, ...) related to Git you want any collaborator to the repository to follow.
This differs from git config
which, for security reason, remains local (both because it can include sensitive information, or dangerous directives)
A .gitattributes
is part of your versioned source code, and contribute to establishing a common Git standard.
For instance, I always put (as in VonC/gitcred/.gitattributes
):
*.bat text eol=crlf
*.go text eol=lf
Because no matter how your IDE/editor is configured, I need CRLF for my Windows bat script to properly run, and I prefer LF for Go files, which I edit on Windows or Linux. I always considered local settings like core.autocrlf
an antipattern, best left to false
.
But a .gitattributes
can declare many other Git elements:
working-tree-encoding
, used for translation filesident
to embed file SHA1 as in herefilter
, most notably used by Git LFS, as in here, and I used it many times before.diff
, at least to avoid diffing binary files, or defining an external diff driverxfunname
for instance): I mention them here.unityyamlmerge
whitespace
to define what diff
and apply
should consider whitespace errorsThe .gitattributes
file is not "mandatory", but a useful tool in the Git toolbox, one that can be shared safely in a project code base.
And you can read it even in bare repositories:
With Git 2.43 (Q4 2023), the attribute subsystem learned to honor attr.tree
configuration that specifies which tree to read the .gitattributes
files from.
See commit 9f9c40c, commit 2386535 (13 Oct 2023) by John Cai (john-cai
).
(Merged by Junio C Hamano -- gitster
-- in commit 26dd307, 30 Oct 2023)
attr
: read attributes from HEAD when bare repoSigned-off-by: John Cai
The motivation for 44451a2 (
attr
: teach , 2023-05-06, Git v2.41.0-rc1 -- merge) (attr: teach "--attr-source=<tree>
" global option to "git
" , 2023-05-06), was to make it possible to usegitattributes
with bare repositories.To make it easier to read
gitattributes
in bare repositories however, let's just makeHEAD:.gitattributes
the default.
This is in line with how mailmap works, 8c473ce ("mailmap
: default mailmap.blob in bare repositories", 2012-12-13, Git v1.8.2-rc0 -- merge).
And, still with Git 2.43 (Q4 2023):
See commit 9f9c40c, commit 2386535 (13 Oct 2023) by John Cai (john-cai
).
(Merged by Junio C Hamano -- gitster
-- in commit 26dd307, 30 Oct 2023)
attr
: addattr.tree
for setting the treeish to read attributes fromSigned-off-by: John Cai
44451a2 (
attr
: teach , 2023-05-06, Git v2.41.0-rc1 -- merge) (attr: teach "--attr-source=" global option to "git
", 2023-05-06) provided the ability to pass in a treeish as the attr source.
In the context of serving Git repositories as bare repos like we do at GitLab however, it would be easier to point--attr-source
to HEAD for all commands by setting it once.Add a new config
attr.tree
that allows this.
git config
now includes in its man page:
attr.tree
A reference to a tree in the repository from which to read attributes, instead of the
.gitattributes
file in the working tree.In a bare repository, this defaults to
HEAD:.gitattributes
.If the value does not resolve to a valid tree object, an empty tree is used instead.
When theGIT_ATTR_SOURCE
environment variable or--attr-source
command line option are used, this configuration variable has no effect.
However, Git 2.46 (Q3 2024), batch 3 notes:
Git 2.43 started using the tree of HEAD as the source of attributes in a bare repository, which has severe performance implications.
For now, revert the change, without ripping out a more explicit support for theattr.tree
configuration variable.
See commit 51441e6 (03 May 2024) by Junio C Hamano (gitster
).
(Merged by Junio C Hamano -- gitster
-- in commit b077cf2, 13 May 2024)
51441e6460
:stop using HEAD for attributes in bare repository by default
With 2386535 ("
attr
: read attributes from HEAD when bare repo", 2023-10-13, Git v2.43.0-rc0 -- merge listed in batch #22), we started to use the HEAD tree as the default attribute source in a bare repository.
One argument for such a behaviour is that it would make things like "git archive
"(man) run in bare and non-bare repositories for the same commit consistent.
This changes was merged to Git 2.43 but without an explicit mention in its release notes.It turns out that this change destroys performance of shallowly cloning from a bare repository.
As the "server" installations are expected to be mostly bare, and "git pack-objects
"(man), which is the core of driving the other side of "git clone
"(man) andgit fetch
(man) wants to see if a path is set not to delta with blobs from other paths via the attribute system, the change forces the server side to traverse the tree of the HEAD commit needlessly to find if each and every paths the objects it sends out has the attribute that controls the deltification.
Given that (1) most projects do not configure such an attribute, and (2) it is dubious for the server side to honor such an end-user supplied attribute anyway, this was a poor choice of the default.To mitigate the current situation, let's revert the change that uses the tree of HEAD in a bare repository by default as the attribute source.
This will help most people who have been happy with the behaviour of Git 2.42 and before.Two things to note:
If you are stuck with versions of Git 2.43 or newer, that is older than the release this fix appears in, you can explicitly set the
attr.tree
configuration variable to point at an empty tree object, i.e.$ git config attr.tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
If you like the behaviour we are reverting, you can explicitly set the attr.tree configuration variable to HEAD, i.e.
$ git config attr.tree HEAD
The right fix for this is to optimize the code paths that allow accesses to attributes in tree objects, but that is a much more involved change and is left as a longer-term project, outside the scope of this "first step" fix.
Upvotes: 37
Reputation: 76754
It depends. The most common uses for .gitattributes
files are line ending handling, working-tree encodings, and Git LFS. If you're using Git LFS, then it's required for those files to be handled as LFS files.
Otherwise, if all you care about is line endings, it depends on your platform. If your project is Unix-only, then it's not required. However, if your project may be used across systems, it's typically helpful to have one to indicate which files are text (that is, should be subject to line ending conversion) and which are not. Git does often guess correctly, but it only looks at the beginning of the file, and in many cases, certain file types (notably PDFs) start with a large block of ASCII-compatible text and then include binary data, and Git will need help.
If you want to include things like shell scripts or batch files, you absolutely do need a .gitattributes
file because POSIX shells don't accept CR as part of a line ending and batch files must contain CRLF. An eol=lf
or eol=crlf
is therefore required for reproducible behaviour.
Similarly, some people on Windows have tools that have not come into modern times (where we overwhelmingly use UTF-8) and still absolutely require their data to be in little-endian UTF-16 with BOM. For those programs, typically a working-tree encoding is important so that Git will internally store them as UTF-8 text and can do things like diffs and merges on them. It is the case that most editors and tools these days handle UTF-8 and LF just fine, which is probably why you haven't really seen problems.
I do strongly recommend at least a simple * text=auto
if nothing else if your project will be used on Windows, because it means that people will not accidentally commit CRLF line endings in your text files and also that people will have the line endings they prefer when working across systems. It's a simple step that can make the experience with your project a lot better.
Upvotes: 7