Reputation: 155115
(This is not a duplicate of How does git detect that a file has been modified? because I'm asking about Windows, the referenced QA mentions stat
and lstat
, which do not apply to Windows).
With traditional systems like SVN and TFS, the "state database" needs to be explicitly and manually informed of any changes to files in your local workspace: files are read-only by default so you don't accidentally make a change without explicitly informing your SVN/TFS client first. Fortunately IDE integration means that operations that result in the addition, modification, deletion and renaming (i.e. "checking-out") of files can be automatically passed on to the client. It also means that you would need something like TortoiseSVN to work with files in Windows Explorer, lest your changes be ignored - and that you should regularly run an often lengthy Server-to-Local comparison scan to detect any changes.
But Git doesn't have this problem - on my Windows machine I can have a gigabyte-sized repo with hundreds of thousands of files, many levels deep, and yet if I make a 1 byte change to a file nested very deeply, I can see that Git knows after running git status
. This is the strange part - because git does not use any daemon processes or background tasks - running git status
also does not involve any significant IO activity that I can see, I get the results back immediately, it does not thrash my disk searching for the change I made.
Additionally, Git GUI tools, such as the Git integration with Visual Studio 2015 also have some degree of magic in them - I can make a change in Notepad or another program, and VS' Git Changes window picks it up immediately. VS could simply be using ReadDirectoryChanges
(FileSystemWatcher
) - but when I look at the devenv
process in Process Explorer I don't see any corresponding handles, but that also doesn't explain how git status
sees the changes.
Upvotes: 1
Views: 3070
Reputation: 78653
As Briana Swift and kostix point out - it is scanning your disk. However, when looking for unstaged changes, it does not need to read every file on your disk. Instead, it can look at the metadata stored in the index to determine what files to examine more closely (actually reading them).
If you use the git-ls-files
command to examine the index, you can see this metadata:
% git ls-files --debug worktree.c
worktree.c
ctime: 1463782535:0
mtime: 1463782535:0
dev: 16777220 ino: 120901250
uid: 501 gid: 20
size: 5591 flags: 0
Now if you run git status
, git will look at worktree.c
on disk. If the timestamps and filesize match, then git will assume that you have not changed this file.
If, however, the timestamps and filesize do not match, then git will look more closely at the file to determine if you have changed it or not.
So git does "thrash" the disk, but in a much more limited manner than if you did something like tf reconcile
to examine your changes. (TFVC, of course, was designed to deal with very large working trees and should never touch your disk if you're using it correctly.)
And yes - Visual Studio does have some magic in it. It runs a background filesystem watcher in both your working directory and some parts of the Git repository. When it notices a change in your working directory, it will re-compute the git status
. It also looks at changes to branches in the Git repository to know when you've switched branches or to recompute the status of your local repository with your remote.
Upvotes: 2
Reputation: 55453
Git runs a Windows equivalent of the POSIX-y lstat(2)
call on each file recorded in the index to have the first stab at figuring out whether the file is modified or not. It compares the modification time and size taken from that information with the values recorded for that file in the index.
This operation is notoriously slow on NTFS (and network-mapped drives) so since some time Git for Windows gained a special tweak controlled with the core.fscache
configuration option which became enabled by default some 2 or 3 GfW releases ago. I don't know the exact details but it tries to minimize the number of times Git needs to lstat(2)
your files.
IIUC, the mechanism enabled by core.fscache
is not making use of filesystem watching Win32 API as Git runs no daemons/services on your system; so it merely optimizes the way Git asks the filesystem layer about the stat info of the tracked files.
Upvotes: 3
Reputation: 1217
Git's process of git status
is very lightweight.
git status
checks the index (also known as staging area, before you run git add
) and the working directory (after git add
but before git commit
), then compares those files with the last committed version. Instead of having to go through every file in the repository, Git first checks these areas to see what to look up in the most recent commit.
git diff
works similarly. I suggest looking here for more information.
Upvotes: 0