Rohan Kumar
Rohan Kumar

Reputation: 5892

Does git caches it's results?

I've been using git for a while but i observed this when i used git on extremely large projects like LibreOffice. Whenever i query git for the first time, it takes significantly more time than the queries following the same command :

 ~/Documents/libo : $ time git status
On branch task
nothing to commit, working directory clean

real    0m23.052s
user    0m0.328s
sys 0m1.248s
~/Documents/libo : $ time git status
On branch task
nothing to commit, working directory clean

real    0m0.415s
user    0m0.208s
sys 0m0.156s
~/Documents/libo : $ 

My question is : Does git uses some kind of caching in it's internal implementation ? If it does, where in the .git/* directory those cached results are stored? Or does it has nothing to do with git or is it dependent on the platform that i'm using.

Upvotes: 3

Views: 2069

Answers (1)

kostix
kostix

Reputation: 55533

Yes and no.

The snippet you showed hints that you're on some Unix-y platform like some GNU/Linux- or *BSD-based OS or Mac OS. These platforms typically feature good filesystem caching, so the next time Git scans your working tree, many/most of the information is served from the main memory rather than the disk.

On the other hand, on Windows, where filesystem operations Git wants to perform are slow, its ptalform port, Git for Windows has a special feature controlled by the core.fscache configuration knob. This feature actually implements real dedicated in-memory caching of the so-called "stat" information for the files in the working tree.

This cache is—AFAIK—in-memory, so it's not stored anywhere.

One additional note is that the so-called "index"—the special place where you stage changes for the next commit, and from which the next commit created by git commit is cut is in fact a special cache, and it was called exactly that way in the early days of Git. Some Git commands still support the --cached command-line option which make them consider only the index—to cite the git help cli manual:

NOTES ON FREQUENTLY CONFUSED OPTIONS

Many commands that can work on files in the working tree and/or in the index can take --cached and/or --index options. Sometimes people incorrectly think that, because the index was originally called cache, these two are synonyms. They are not — these two options mean very different things.

The --cached option is used to ask a command that usually works on files in the working tree to only work with the index. For example, git grep, when used without a commit to specify from which commit to look for strings in, usually works on files in the working tree, but with the --cached option, it looks for strings in the index.

The --index option is used to ask a command that usually works on files in the working tree to also affect the index. For example, git stash apply usually merges changes recorded in a stash to the working tree, but with the --index option, it also merges changes to the index as well.

git apply command can be used with --cached and --index (but not at the same time). Usually the command only affects the files in the working tree, but with --index, it patches both the files and their index entries, and with --cached, it modifies only the index entries.

This situation with the staging area being literally "the cache" stems from the fact Git was originally envisioned as an implementation of a so-called "content-addressable filesystem" which just managed to quickly outgrew that idea to become a full-blown VCS built around that core idea. The cache would hold entries about to be recorded as the next filesystem snapshot (the commit) for fast access. This is true even today: the index keeps the "stat" information on the staged files for git status to work fast by skipping actually calculating hashes on files which appears to not having been changed in the working tree compared to what's in the index.

Please see the "Git History" page on the Git SCM wiki and look for the word "cache" here: it explains the historical backgrounds for the index pretty well.

The bottom line is that there are multiple different caches in play here: the filesystem cache by the OS, the Git's own cache and—when enabled—Windows-specific cache.

Only the index is actually "stored": with stock Git without special configuration tweaks thiis is the file named "index" located under the ".git" subdirectory.

Upvotes: 4

Related Questions