jowey
jowey

Reputation: 8331

Is it mandatory to use the filter attribute for Git Large File Storage (LFS) in gitattributes?

When I use the .gitattributes file with the following pattern *.png binary to handle large PNG files with Git LFS, nothing happens, LFS is ignored.
When I set the track pattern manually with git lfs track '*.png' I get the following line in the .gitattributes file:
'*.png' filter=lfs diff=lfs merge=lfs -text
This is working fine.

So was there a change in a recent update of Git or Git LFS that makes it mandatory to use the filter attribute?
Or is the pattern just wrong? I guessed this is still fine since prominent resources like this repository use it.


Additional information:
Through research and testing I found out that the diff and merge attributes are only placeholders for LFS for now and it doesn't make a difference if I remove them, but removing the filter attribute breaks LFS again (no error - files are just added to the repository as if there was no pattern for the file type).

This doesn't make sense to me since the filter is enforced through the global GIT config after running git lfs install (if I understand correctly). Here the relevant part from the .gitconfig:

[filter "lfs"]
    clean = git-lfs clean -- %f
    smudge = git-lfs smudge -- %f
    process = git-lfs filter-process
    required = true

Btw. it also seems it doesn't matter if the pattern in the .gitattributes is quoted ('*.png' filter=lfs -text) or not (*.png filter=lfs -text), is this correct?


git-lfs/2.10.0 (GitHub; windows amd64; go 1.12.7; git a526ba6b)
git version 2.26.2

Tested on command line and with Sourcetree.
Repository from Bitbucket

Upvotes: 4

Views: 4947

Answers (1)

torek
torek

Reputation: 489588

... was there a change in a recent update of Git or Git LFS that makes it mandatory to use the filter attribute?

No: it has always been mandatory.1 The reason for this is that the way Git-LFS works is that it uses the smudge and clean filters to have Git store, as the contents in your repository, a file that contains information about how to retrieve another file, not stored in Git at all. This other file is stored on some server—this need not be the same as your Git servers—and retrieved from there by the smudge filter. The file stored on that other server is updated (well, augmented) with a new one by the clean filter.2

Btw. it also seems it doesn't matter if the pattern in the .gitattributes is quoted ('*.png' filter=lfs -text) or not (*.png filter=lfs -text), is this correct?

Yes. You should only need quotes if the file name itself has white-space in it. However, the quotes must be double quotes, not single quotes: "*.png".

(Note that Git's handling of smudge and clean filters is a bit odd: the driver definition goes on a .gitconfig or .git/config file, and hence can be global or per-repository, but the use of the driver goes in .gitattributes and hence is always per-repository. The reason for this has to do with the security model around filter drivers.)


1Someone could, and maybe has, built a front end command that hides this from you, but it's still required as described above.

2In more detail: when you have commit H (some hash ID) checked out, Git has, in effect, not one but three "active" copies of each file:

  • One of these copies is frozen for all time, and is in the current commit, i.e., commit H. This copy—or its content, anyway; the mode and file name are stored separately—is in a special, read-only, Git-only format, and de-duplicated against identical copies that may be in other Git commits.

    Git calls these frozen-format content objects blob objects. You don't normally deal with them directly.

  • The second copy is another de-duplicated blob object—content in the frozen format—but because it's stored in Git's index, it can be replaced at any time.

  • The last copy of the file is in your work-tree and is an ordinary everyday file. It's not compressed, and it's not in some special format that only Git can read and nobody can write: it's an ordinary everyday file.

Normally, that last file is made by copying-and-decompressing the internal blob object. If you set up a smudge filter for a file, though, instead of Git just doing this decompression on its own, Git decompresses the file but then runs the content through the smudge filter. The LFS smudge filter reads the content, then calls up the LFS server and says "here's the lookup key: get me the real content". The LFS smudge filter writes the retrieved file into your work-tree.

Normally, git add file works by copying-and-compressing the given file into an internal blob object and then writing that into the index. If you set up a clean filter for the file, though, Git doesn't read the file directly: it has the smudge filter read and edit the file. The LFS smudge filter "edits" the file by reading the data and storing it on the LFS server, then generating a new lookup key.

Hence, when you have the LFS filters in place, the only data Git ever sees is the LFS-server-lookup key.

The choice of what smudge and clean filters to use for which files is set in .gitattributes and/or .git/info/attributes. The program to run for a given smudge or clean filter is set in a Git configuration file, using git config or git config --global or git config --system for instance.

Upvotes: 3

Related Questions