user16435030
user16435030

Reputation:

Is a .gitattributes file really necessary for git?

I've recently been reading up a bit on .gitattributes and also found places like this one, https://github.com/alexkaratarakis/gitattributes, where they try to maintain gitattributes for all file types. However in my mind, looking through those files, I instinctively think this is an unmaintainable mess. It means you'd have to update that file any time you use any new file extension, or any software brings out a new file extension, which is just impossible. When you're working with a team of 30+ people it's just a nightmare to maintain some file like that, we can barely maintain a simple icons.svg file.

But along with that I have been coding and using git for many years, on many different projects, and I've never used .gitattributes. We use things like prettier on our project which rewrites newlines to "lf" and we have devs on windows and things like this never gives any issues, vscode also never gives any issues with things like this. Git also automatically picks up binary files like pngs and automatically shows text differences for files like svg, I've never had to configure that.

So I ask the question, is it really necessary to have this file? Because it seems to me like it's signing up for a ton of maintenance that's completely unnecessary and that git is smart enough to figure out what it should or shouldn't do with a file.

Upvotes: 34

Views: 28685

Answers (2)

VonC
VonC

Reputation: 1326676

is it really necessary to have this file?

Yes, for any setting (eol, diff, merge filters, content filters, ...) related to Git you want any collaborator to the repository to follow.

This differs from git config which, for security reason, remains local (both because it can include sensitive information, or dangerous directives)

A .gitattributes is part of your versioned source code, and contribute to establishing a common Git standard.
For instance, I always put (as in VonC/gitcred/.gitattributes):

*.bat   text eol=crlf
*.go    text eol=lf

Because no matter how your IDE/editor is configured, I need CRLF for my Windows bat script to properly run, and I prefer LF for Go files, which I edit on Windows or Linux. I always considered local settings like core.autocrlf an antipattern, best left to false.

But a .gitattributes can declare many other Git elements:

The .gitattributes file is not "mandatory", but a useful tool in the Git toolbox, one that can be shared safely in a project code base.


And you can read it even in bare repositories:

With Git 2.43 (Q4 2023), the attribute subsystem learned to honor attr.tree configuration that specifies which tree to read the .gitattributes files from.

See commit 9f9c40c, commit 2386535 (13 Oct 2023) by John Cai (john-cai).
(Merged by Junio C Hamano -- gitster -- in commit 26dd307, 30 Oct 2023)

attr: read attributes from HEAD when bare repo

Signed-off-by: John Cai

The motivation for 44451a2 (attr: teach , 2023-05-06, Git v2.41.0-rc1 -- merge) (attr: teach "--attr-source=<tree>" global option to "git" , 2023-05-06), was to make it possible to use gitattributes with bare repositories.

To make it easier to read gitattributes in bare repositories however, let's just make HEAD:.gitattributes the default.
This is in line with how mailmap works, 8c473ce ("mailmap: default mailmap.blob in bare repositories", 2012-12-13, Git v1.8.2-rc0 -- merge).

And, still with Git 2.43 (Q4 2023):

See commit 9f9c40c, commit 2386535 (13 Oct 2023) by John Cai (john-cai).
(Merged by Junio C Hamano -- gitster -- in commit 26dd307, 30 Oct 2023)

attr: add attr.tree for setting the treeish to read attributes from

Signed-off-by: John Cai

44451a2 (attr: teach , 2023-05-06, Git v2.41.0-rc1 -- merge) (attr: teach "--attr-source=" global option to "git", 2023-05-06) provided the ability to pass in a treeish as the attr source.
In the context of serving Git repositories as bare repos like we do at GitLab however, it would be easier to point --attr-source to HEAD for all commands by setting it once.

Add a new config attr.tree that allows this.

git config now includes in its man page:

attr.tree

A reference to a tree in the repository from which to read attributes, instead of the .gitattributes file in the working tree.

In a bare repository, this defaults to HEAD:.gitattributes.

If the value does not resolve to a valid tree object, an empty tree is used instead.
When the GIT_ATTR_SOURCE environment variable or --attr-source command line option are used, this configuration variable has no effect.


However, Git 2.46 (Q3 2024), batch 3 notes:

Git 2.43 started using the tree of HEAD as the source of attributes in a bare repository, which has severe performance implications.
For now, revert the change, without ripping out a more explicit support for the attr.tree configuration variable.

See commit 51441e6 (03 May 2024) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit b077cf2, 13 May 2024)

51441e6460:stop using HEAD for attributes in bare repository by default

With 2386535 ("attr: read attributes from HEAD when bare repo", 2023-10-13, Git v2.43.0-rc0 -- merge listed in batch #22), we started to use the HEAD tree as the default attribute source in a bare repository.
One argument for such a behaviour is that it would make things like "git archive"(man) run in bare and non-bare repositories for the same commit consistent.
This changes was merged to Git 2.43 but without an explicit mention in its release notes.

It turns out that this change destroys performance of shallowly cloning from a bare repository.
As the "server" installations are expected to be mostly bare, and "git pack-objects"(man), which is the core of driving the other side of "git clone"(man) and git fetch(man) wants to see if a path is set not to delta with blobs from other paths via the attribute system, the change forces the server side to traverse the tree of the HEAD commit needlessly to find if each and every paths the objects it sends out has the attribute that controls the deltification.
Given that (1) most projects do not configure such an attribute, and (2) it is dubious for the server side to honor such an end-user supplied attribute anyway, this was a poor choice of the default.

To mitigate the current situation, let's revert the change that uses the tree of HEAD in a bare repository by default as the attribute source.
This will help most people who have been happy with the behaviour of Git 2.42 and before.

Two things to note:

  • If you are stuck with versions of Git 2.43 or newer, that is older than the release this fix appears in, you can explicitly set the attr.tree configuration variable to point at an empty tree object, i.e.

    $ git config attr.tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
    
  • If you like the behaviour we are reverting, you can explicitly set the attr.tree configuration variable to HEAD, i.e.

    $ git config attr.tree HEAD
    

The right fix for this is to optimize the code paths that allow accesses to attributes in tree objects, but that is a much more involved change and is left as a longer-term project, outside the scope of this "first step" fix.

Upvotes: 37

bk2204
bk2204

Reputation: 76754

It depends. The most common uses for .gitattributes files are line ending handling, working-tree encodings, and Git LFS. If you're using Git LFS, then it's required for those files to be handled as LFS files.

Otherwise, if all you care about is line endings, it depends on your platform. If your project is Unix-only, then it's not required. However, if your project may be used across systems, it's typically helpful to have one to indicate which files are text (that is, should be subject to line ending conversion) and which are not. Git does often guess correctly, but it only looks at the beginning of the file, and in many cases, certain file types (notably PDFs) start with a large block of ASCII-compatible text and then include binary data, and Git will need help.

If you want to include things like shell scripts or batch files, you absolutely do need a .gitattributes file because POSIX shells don't accept CR as part of a line ending and batch files must contain CRLF. An eol=lf or eol=crlf is therefore required for reproducible behaviour.

Similarly, some people on Windows have tools that have not come into modern times (where we overwhelmingly use UTF-8) and still absolutely require their data to be in little-endian UTF-16 with BOM. For those programs, typically a working-tree encoding is important so that Git will internally store them as UTF-8 text and can do things like diffs and merges on them. It is the case that most editors and tools these days handle UTF-8 and LF just fine, which is probably why you haven't really seen problems.

I do strongly recommend at least a simple * text=auto if nothing else if your project will be used on Windows, because it means that people will not accidentally commit CRLF line endings in your text files and also that people will have the line endings they prefer when working across systems. It's a simple step that can make the experience with your project a lot better.

Upvotes: 7

Related Questions