Reputation: 11224
I have a repo of 10 GB on a Linux machine which is on NFS. The first time git status
takes 36 minutes and subsequent git status
takes 8 minutes. Seems Git depends on the OS for caching files. Only the first git
commands like commit
, status
that involves pack/repack the whole repo takes a very long time for a huge repo. I am not sure if you have used git status
on such a large repo, but has anyone come across this issue?
I have tried git gc
, git clean
, git repack
but the time taken is still/almost the same.
Will sub-modules or any other concepts like breaking the repo into smaller ones help? If so which is the best for splitting a larger repo. Is there any other way to improve time taken for git commands on a large repo?
Upvotes: 104
Views: 93342
Reputation: 674
In our codebase where we have somewhere in the range of 20 - 30 submodules,
git status --ignore-submodules
sped things up for me drastically. Do note that this will not report on the status of submodules.
See --ignore-submodules
docs for additional options: "none", "untracked", "dirty" or "all". Using --ignore-submodules=dirty
can give a good compromise not checking the submodule working tree files, and only report if the commit has changed.
To make this the default for all future commands: git config diff.ignoreSubmodules dirty
(Thanks @d2207197)
Upvotes: 7
Reputation: 1328112
With Git 2.40 (Q1 2023), the advice message given by "git status
"(man) when it takes a long time to enumerate untracked paths has been updated.
It better illustrates all the configuration settings you can apply to get a snappier/faster git status
.
See commit ecbc23e (30 Nov 2022) by Rudy Rigot (rudyrigot
).
(Merged by Junio C Hamano -- gitster
-- in commit f3d9bc8, 19 Dec 2022)
status
: modernize git-status "slow untracked files" adviceSigned-off-by: Rudy Rigot
git status
(man) can be slow when there are a large number of untracked files and directories since Git must search the entire worktree to enumerate them.
When it is too slow, Git prints advice with the elapsed search time and a suggestion to disable the search using the-uno
option.
This suggestion also carries a warning that might scare off some users.However, these days,
-uno
isn't the only option.
Git can reduce the time taken to enumerate untracked files by caching results from previousgit status
invocations, when thecore.untrackedCache
andcore.fsmonitor
features are enabled.Update the
git status
man page to explain these configuration options, and update the advice to provide more detail about the current configuration and to refer to the updated documentation.
git status
now includes in its man page:
UNTRACKED FILES AND PERFORMANCE
git status
can be very slow in large worktrees if/when it needs to search for untracked files and directories.There are many configuration options available to speed this up by either avoiding the work or making use of cached results from previous Git commands.
There is no single optimum set of settings right for everyone.We'll list a summary of the relevant options to help you, but before going into the list, you may want to run
git status
again, because your configuration may already be cachinggit status
results, so it could be faster on subsequent runs.
The
--untracked-files=no
flag or thestatus.showUntrackedFiles=no
config (see above for both): indicate thatgit status
should not report untracked files. This is the fastest option.git status
will not list the untracked files, so you need to be careful to remember if you create any new files and manuallygit add
them.
advice.statusUoption=false
(seegit config
): setting this variable tofalse
disables the warning message given when enumerating untracked files takes more than 2 seconds. In a large project, it may take longer and the user may have already accepted the trade off (e.g. using "-uno" may not be an acceptable option for the user), in which case, there is no point issuing the warning message, and in such a case, disabling the warning may be the best.
core.untrackedCache=true
(seegit update-index
): enable the untracked cache feature and only search directories that have been modified since the previousgit status
command.
Git remembers the set of untracked files within each directory and assumes that if a directory has not been modified, then the set of untracked files within has not changed.This is much faster than enumerating the contents of every directory, but still not without cost, because Git still has to search for the set of modified directories. The untracked cache is stored in the
.git/index
file. The reduced cost of searching for untracked files is offset slightly by the increased size of the index and the cost of keeping it up-to-date. That reduced search time is usually worth the additional size.
core.untrackedCache=true
andcore.fsmonitor=true
orcore.fsmonitor=<hook_command_pathname>
(seegit update-index
): enable both the untracked cache and FSMonitor features and only search directories that have been modified since the previousgit status
command.This is faster than using just the untracked cache alone because Git can also avoid searching for modified directories.
Git only has to enumerate the exact set of directories that have changed recently. While the FSMonitor feature can be enabled without the untracked cache, the benefits are greatly reduced in that case.Note that after you turn on the untracked cache and/or FSMonitor features it may take a few
git status
commands for the various caches to warm up before you see improved command times.
This is normal.
Upvotes: 5
Reputation: 177775
To be more precise, git depends on the efficiency of the lstat(2)
system call, so tweaking your client’s “attribute cache timeout” might do the trick.
The manual for git-update-index
— essentially a manual mode for git-status
— describes what you can do to alleviate this, by using the --assume-unchanged
flag to suppress its normal behavior and manually update the paths that you have changed. You might even program your editor to unset this flag every time you save a file.
The alternative, as you suggest, is to reduce the size of your checkout (the size of the packfiles doesn’t really come into play here). The options are a sparse checkout, submodules, or Google’s repo tool.
(There’s a mailing list thread about using Git with NFS, but it doesn’t answer many questions.)
Upvotes: 53
Reputation: 80382
As a test, try temporarily disabling realtime protection for antivirus software. If that's the issue, swap your antivirus.
Case in point: I had Webroot running, and it was taking 30 to 60 seconds to do anything with Git. Paused the realtime protection, and suddenly my original performance was back, with sub-second updates and a fast, snappy system.
I chose Webroot as it is famed for minimal impact on system performance, but in this case it was pouring metaphorical molasses into my CPU.
Upvotes: 0
Reputation: 9493
A frequent cause of slowness for big repos is status
command's up-to-date check with the remote branch - set this repo-level configuration to disable it:
git config status.aheadBehind false
Upvotes: 1
Reputation: 1328112
The performance of git status should improve with Git 2.13 (Q2 2017).
See commit 950a234 (14 Apr 2017) by Jeff Hostetler (jeffhostetler
).
(Merged by Junio C Hamano -- gitster
-- in commit 8b6bba6, 24 Apr 2017)
string-list
: use ALLOC_GROW
macro when reallocing string_list
Use
ALLOC_GROW()
macro when reallocing astring_list
array rather than simply increasing it by 32.
This is a performance optimization.During status on a very large repo and there are many changes, a significant percentage of the total run time is spent reallocing the
wt_status.changes
array.This change decreases the time in
wt_status_collect_changes_worktree()
from 125 seconds to 45 seconds on my very large repository.
Plus, Git 2.17 (Q2 2018) will introduce a new trace, for measuring where the time is spent in the index-heavy operations.
See commit ca54d9b (27 Jan 2018) by Nguyễn Thái Ngọc Duy (pclouds
).
(Merged by Junio C Hamano -- gitster
-- in commit 090dbea, 15 Feb 2018)
trace
: measure where the time is spent in the index-heavy operations
All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not.
An unoptimized git-status would give something like below:0.001791141 s: read cache ... 0.004011363 s: preload index 0.000516161 s: refresh index 0.003139257 s: git command: ... 'status' '--porcelain=2' 0.006788129 s: diff-files 0.002090267 s: diff-index 0.001885735 s: initialize name hash 0.032013138 s: read directory 0.051781209 s: git command: './git' 'status'
The same Git 2.17 (Q2 2018) improves git status
with:
commit f39a757, commit 3ca1897, commit fd9b544, commit d7d1b49 (09 Jan 2018) by Jeff Hostetler (jeffhostetler
).
(Merged by Junio C Hamano -- gitster
-- in commit 4094e47, 08 Mar 2018)
"git status
" can spend a lot of cycles to compute the relation
between the current branch and its upstream, which can now be
disabled with "--no-ahead-behind
" option.
commit ebbed3b (25 Feb 2018) by Derrick Stolee (derrickstolee
).
revision.c
: reduce object database queries
In
mark_parents_uninteresting()
, we check for the existence of an object file to see if we should treat a commit as parsed. The result is to set the "parsed" bit on the commit.Modify the condition to only check
has_object_file()
if the result would change the parsed bit.When a local branch is different from its upstream ref, "
git status
" will compute ahead/behind counts.
This usespaint_down_to_common()
and hitsmark_parents_uninteresting()
.On a copy of the Linux repo with a local instance of "master" behind the remote branch "
origin/master
" by ~60,000 commits, we find the performance of "git status
" went from 1.42 seconds to 1.32 seconds, for a relative difference of -7.0%.
Git 2.24 (Q3 2019) proposes another setting to improve git status
performance:
See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e (13 Aug 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit f4f8dfe, 09 Sep 2019)
repo-settings: create feature.manyFiles setting
The
feature.manyFiles
setting is suitable for repos with many files in the working directory.
By settingindex.version=4
andcore.untrackedCache=true
, commands such as 'git status
' should improve.
But:
With Git 2.24 (Q4 2019), the codepath that reads the index.version
configuration was broken with a recent update, which has been corrected.
See commit c11e996 (23 Oct 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit 4d6fb2b, 24 Oct 2019)
repo-settings
: read an int for index.versionSigned-off-by: Derrick Stolee
Several config options were combined into a
repo_settings
struct in ds/feature-macros, including a move of the "index.version" config setting in 7211b9e ("repo-settings
: consolidate some config settings", 2019-08-13, Git v2.24.0-rc1 -- merge listed in batch #0).Unfortunately, that file looked like a lot of boilerplate and what is clearly a factor of copy-paste overload, the config setting is parsed with
repo_config_ge_bool()
instead ofrepo_config_get_int()
. This means that a setting "index.version=4" would not register correctly and would revert to the default version of 3.I caught this while incorporating v2.24.0-rc0 into the VFS for Git codebase, where we really care that the index is in version 4.
This was not caught by the codebase because the version checks placed in
t1600-index.sh
did not test the "basic" scenario enough. Here, we modify the test to include these normal settings to not be overridden byfeatures.manyFiles
orGIT_INDEX_VERSION
.
While the "default" version is 3, this is demoted to version 2 indo_write_index()
when not necessary.
git status
will also compare SHA1 faster, due to Git 2.33 (Q3 2021), using an optimized hashfile API in the codepath that writes the index file.
See commit f6e2cd0, commit 410334e, commit 2ca245f (18 May 2021), and commit 68142e1 (17 May 2021) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit 0dd2fd1, 14 Jun 2021)
csum-file.h
: increase hashfile buffer sizeSigned-off-by: Derrick Stolee
The hashfile API uses a hard-coded buffer size of 8KB and has ever since it was introduced in c38138c ("
git-pack-objects
: write the pack files with a SHA1 csum", 2005-06-26, Git v0.99 -- merge).
It performs a similar function to the hashing buffers inread-cache.c
, but that code was updated from 8KB to 128KB in f279894 ("read-cache
: make the index write buffer size 128K", 2021-02-18, Git v2.31.0-rc1 -- merge).
The justification there was thatdo_write_index()
improves from 1.02s to 0.72s.
Since our end goal is to have the index writing code use the hashfile API, we need to unify this buffer size to avoid a performance regression.Since these buffers are now on the heap, we can adjust their size based on the needs of the consumer.
In particular, callers tohashfd_throughput()
are expecting to report progress indicators as the buffer flushes.
These callers would prefer the smaller 8k buffer to avoid large delays between updates, especially for users with slower networks.
When the progress indicator is not used, the larger buffer is preferable.By adding a new
trace2
region in the chunk-format API, we can see that the writing portion of 'git multi-pack-index write
'(man) lowers from ~1.49s to ~1.47s on a Linux machine.
These effects may be more pronounced or diminished on other filesystems.
Upvotes: 11
Reputation: 5519
Try git gc. Also, git clean may help.
The git manual states:
Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git add.
Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.
I always notice a difference after running git gc when git status is slow!
UPDATE II - Not sure how I missed this, but the OP already tried git gc
and git clean
. I swear that wasn't originally there, but I don't see any changes in the edits. Sorry for that!
Upvotes: 40
Reputation: 4226
Ok, this is quite hard to believe if I wouldn't see with my eyes... I had very BAD performance on my brand new work laptop, git status
takes from 5 to 10 seconds to complete even for the most stupid repository.
I've tried all the advice in this thread then I noticed that also git log
was slow so I've broad my search for generic slowness of git fresh installation and I've found this
https://github.com/gitextensions/gitextensions/issues/5314#issuecomment-416081823
in a desperate move I've tried to update the graphic driver of my laptop and...
Holy Santa Claus sh*t... that did the trick!
...for me too!
So apparently graphic card driver have some relation here... hard to understand why, but now the performance are "as expected"!
Upvotes: 2
Reputation: 851
It is a pretty old question. Though, I am surprised that no one commented about binary file given the repository size.
You mentioned that your git repo is ~10GB. It seems that apart from NFS issue and other git issues (resolvable by git gc
and git configuration change as outline in other answers), git commands (git status, git diff, git add) might be slow because of large number of binary file in the repository. git is not good at handling binary file. You can remove unnecessary binary file using following command (example is given for NetCDF file; have a backup of git repository before):
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch *.nc' \
--prune-empty --tag-name-filter cat -- --all
Do not forget to put '*.nc' to gitignore file to stop git from recommit the file.
Upvotes: -1
Reputation: 25695
index.lock
filesgit status
can be pathologically slow when you have leftover index.lock
files.
This happens especially when you have git submodules
, because then you often don't notice such lefterover files.
Summary: Run find .git/ -name index.lock
, and delete the leftover files after checking that they are indeed not used by any currently running program.
I found that my shell git status was extremely slow in my repo, with git 2.19 on Ubuntu 16.04.
Dug in and found that /usr/bin/time git status
in my assets
git submodule took 1.7 seconds.
Found with strace
that git read all my big files in there with mmap
.
It doesn't usually do that, usually stat
is enough.
I googled the problem and found the Use of index and Racy Git problem.
Tried git update-index somefile
(in my case gitignore
in the submodule checkout) shown here
but it failed with
fatal: Unable to create '/home/niklas/src/myproject/.git/modules/assets/index.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
This is a classical error. Usually you notice it at any git operation, but for submodules that you don't often commit to, you may not notice it for months, because it only appears when adding something to the index; the warning is not raised on read-only git status
.
Removing the index.lock
file, git status
became fast immediately, mmaps
disappeared, and it's now over 1000x faster.
So if your git status is unnaturally slow, check find .git/ -name index.lock
and delete the leftovers.
Upvotes: 1
Reputation: 915
Something that hasn't been mentioned yet is, to activate the filesystem cache on windows machines (linux filesystems are completly different and git was optimized for them, therefore this probably only helps on windows).
git config core.fscache true
git config core.ignoreStat true
BUT: Changed files have to be added afterwards by the dev himself with git add
. Git doesn't find changes itself.
Upvotes: 5
Reputation: 24991
git config --global core.preloadIndex true
Did the job for me. Check the official documentation here.
Upvotes: 7
Reputation: 401
I'm also seeing this problem on a large project shared over NFS.
It took me some time to discover the flag -uno that can be given to both git commit and git status.
What this flag does is to disable looking for untracked files. This reduces the number of nfs operations significantly. The reason is that in order for git to discover untracked files it has to look in all subdirectories so if you have many subdirectories this will hurt you. By disabling git from looking for untracked files you eliminate all these NFS operations.
Combine this with the core.preloadindex flag and you can get resonable perfomance even on NFS.
Upvotes: 40
Reputation: 2190
If your git repo makes heavy use of submodules, you can greatly speed up the performance of git status by editing the config file in the .git directory and setting ignore = dirty
on any particularly large/heavy submodules. For example:
[submodule "mysubmodule"]
url = ssh://mysubmoduleURL
ignore = dirty
You'll lose the convenience of a reminder that there are unstaged changes in any of the submodules that you may have forgotten about, but you'll still retain the main convenience of knowing when the submodules are out of sync with the main repo. Plus, you can still change your working directory to the submodule itself and use git status within it as per usual to see more information. See this question for more details about what "dirty" means.
Upvotes: 27