Daniel Stephens
Daniel Stephens

Reputation: 3219

Do I need Git LFS for local repos?

I created a Git repo that will exclusively be stored locally and I ask myself, if I really need Git LFS for binaries? As far as I can see, the .gitattributes is properly configured as in:

*.psd binary

And yes, the files land in .git/objects/..., but they are compressed and don't take much space. So to sum it up, what are the benefits of Git LFS in a local repository if I never push/pull from/to a remote repo?

Thanks!

Upvotes: 10

Views: 9030

Answers (3)

Mark Bramnik
Mark Bramnik

Reputation: 42491

To add to the excellent answer already provided by @Schwern and address OP's comment.

Here is a link to the documentation of GIT LFS from Atlassian, one of the two main companies (the other is GitHub) that stand behind this extension.

The idea is that the binaries are downloaded from the "remote" repository lazily during the checkout process rather than during cloning or fetching.

Technically git lfs stores "lazily" evaluated pointers to the binaries.

This makes a lot of sense because git has a "commitment" to be able to provide you access to the state of the code base after every commit, so the following situation is possible:

  1. commit A: added large binary file a.bin (let's say a.bin is in version 1)
  2. push the changes
  3. commit B: made changes in the binary file a.bin (a.bin is in version 2 now)
  4. push the changes
  5. Now check out the SHA1 of commit A - the git has to provide you a.bin in version 1.

This is true even if you've decided to remove the a.bin and commit it, there should still be a possibility to access the file-system state after "commit A".

So At least locally there is no point in storing version 1 if you explicitly don't need that.

One more note, to directly address the question and clarify: yes you have to enable git lfs support locally, but in addition, you also have to enable git lfs support on your remote repo (I did that with Bitbucket once, I'm sure its competitors support that as well).

Upvotes: 3

Alexander Gogl
Alexander Gogl

Reputation: 301

It depends on your workflow and the facilities you have access to.

Git stores versions of files as blobs. These blobs are diff compressed, whereby only differences are stored. Therefore, the file size increases only marginally.

The situation is different if the versioned file is a binary or a file where a single change restructures the whole file. In that case, Git stores a copy of each file, whereby the repository grows rapidly.

Comparison between Git and Git-LFS blob sizes

Git does a good job in diff compressing even big files. I've found that the compression of large files can be excellent (size of versioned file in .git/ after running git commit or git gc):

type change file size as git-lfs blob as git blob after git gc
Vectorworks (.vwx) added geometry 28.8 MB 28.8 MB 26.5 MB 1.8 MB
GeoPackage (.gpkg) added geometry 16.9 MB 16.9 MB 3.7 MB 3.5 MB
Affinity Photo (.afphoto) toggled layers 85.8 MB 85.6 MB 85.6 MB 0.8 MB
FormZ (.fmz) added geometry 66.3 MB 66.3 MB 66.3 MB 66.3 MB
Photoshop (.psd) toggled layers 25.8 MB 25.8 MB 15.8 MB 15.4 MB
Movie (mp4) trimmed 13.1 MB 13.1 MB 13.2 MB 0 MB
delete a file -13.1 MB 0 MB 0 MB 0 MB

If you don't have a remote to push to, it is better to not use Git-LFS because Git-LFS versioned files seem to add no additional compression at all (see above).

Also one important lesson learnt here is that Git's diff compression method doesn't work with real binary files like .fmz. These would be the best candidates for putting under Git-LFS versioning.

For other file types that seem to be non-textual, but their structure is text-like (.vwx or .afphoto) the diff method performs well. In a single user scenario, where overall repository size and not committing speed is critical, I wouldn't put these under Git-LFS versioning because the Git blob size is significantly smaller than the LFS blob, thus saving space at the local and the remote.

Benefits of Git-LFS

Git-LFS provides a solution to this problem by storing older version of large binary files at a place outside the repository (the Remote) and replacing it by a pointer file. If an older version is needed, then the client pulls it from the remote. Therefore, if a designer pulls the latest state from the remote, he will only download the latest state and the pointer files.

Therefore, Git-LFS can only be facilitated if you have access to a remote that is located at an LFS-enabled server. If there is no server to push the blobs to, then LFS-tracked blobs will stay in the local repo, therefore the advantage of decreasing local storage consumption is not utilized.

Usually, the remote is an LFS-enabled git provider, which can be too expensive for some projects. However, there are also solutions to host a Git-LFS remote locally.

How to integrate Git-LFS in a local repository

Natively, Git-LFS allows transferring data through HTTPs only. Therefore, you require a separate Git-LFS server for storing the large files. However, there is ''no official server'' implementation for local hosting. But there are some unofficial ways like Git-LFS Folderstore to do that.

Git-LFS Folderstore provides a way to manage a Git-LFS remote locally. It works on a local machine and on a network drive. If you are on Mac OS X, then you can set it up by copying the lfs-folderstore executable lfs-folderstore to /usr/local/bin and then:

# Creating a remote repository on a volume (attached drive or NAS)
cd path/to/remote
mkdir origin

# create a bare git repository in origin
cd origin
git init origin --bare

# Add remote to local repository
cd path/to/local/repository
git remote add origin <path/to/remote/origin>

# Enable Git-LFS in local repository
git lfs install

# Track filetype psd
git lfs track "*.psd"

# Configure lfs of the local repository
git config --add lfs.customtransfer.lfs-folder.path lfs-folderstore
git config --add lfs.standalonetransferagent lfs-folder
git config --add lfs.customtransfer.lfs-folder.args "Volumes/path/to/remote/origin"

# Commit changes
git commit -am "commit message"

# Push media to remote
`git push origin master`

Use "' if your remote path contains spaces.

How to cleanup the local repository

You can compress the size of your git repository by calling the Git Garbage Collector git gc. It won't touch the Git-LFS blobs tough.

Git-LFS will only remove blobs from the local repository .git/lfs/objects/ if they have been pushed to a remote AND if the commit containing the blobs is older than recent (3 days). Here are the commands if you want to do it manually:

# remove lfs duplicates
# https://github.com/git-lfs/git-lfs/blob/main/docs/man/git-lfs-dedup.1.ronn
git lfs dedup

# clean old local lfs files (>3 days of commit)
# https://github.com/git-lfs/git-lfs/blob/main/docs/man/git-lfs-prune.1.ronn
git lfs prune

Upvotes: 20

Schwern
Schwern

Reputation: 164919

To git-lfs or not to git-lfs?

git-lfs stores old versions of file contents in the cloud while keeping their history on disk. This has two main benefits.

  1. It can drastically reduce the size of the initial git clone of a repository.
  2. It can drastically reduce the size of the local repository.

Obviously number 1 doesn't apply if the repository is never shared.

If these binaries are really large, and if you change them frequently, they may begin to impact your available free disk space. If so, git-lfs can be of benefit by offloading the old copies of the binaries to cloud storage.

Fortunately, you can always retroactively apply git-lfs later using the BFG Repo Cleaner if the local repo gets too large.

Binary or not?

As far as I can see, the .gitattributes is properly configured as in: *.psd binary

This is a separate issue from git-lfs.

If the file is marked as binary, Git will assume it cannot usefully diff nor merge the contents. Every time you change the file Git will store a complete copy of the file. This will obviously eat up a lot more disk space.

Even if the file is "binary" (ie. not plain text), Git may be able to store only the change if you don't mark it as binary. However, if the file is already compressed this effectively randomizes the file contents and makes diffing impossible. Many image formats are compressed.

Alexander Gogl did some experiments in their answer and it seems Git will store the whole .psd file.

Upvotes: 9

Related Questions