Reputation:
I've gone through the Git Internals book and mostly understand how Git structures things into blobs, trees, commits and that branches are lightweight pointers to commits.
The part I don't quite grasp is how Git reflects these changes onto the file system across branch/commit checkouts.
For example:
Consider two files, A.txt
and B.txt
, committed to Commit 1
. In addition to the two files, a file C.txt
is committed to Commit 2
.
From what I understand, the object graph would be along the lines of the following:
Commit 1
points to Tree 1
which has blobs for the two initial files - BlobA
and BlobB
Commit 2
points to Tree 2
which has blobs for three files. BlobA
and BlobB
remain the same since their content has not changed, while BlobC
will also be under Tree 2
.Now, if I'm currently at Commit 2
and checkout to Commit 1
, HEAD
now points to Commit 1
, and we can traverse the directed graph that tells the state of the repository. Now, the file C.txt
is no longer on the file system.
How does Git reflect the state of the object graph onto the file system on every checkout?
Thanks.
Upvotes: 3
Views: 409
Reputation: 489838
Most of Git's work-tree actions are actually controlled via the index. This means no graph traversal is required at all!
The index's primary role (outside of merges at least) is to act as the place in which you build up the next commit to make. This gives it the name that many people prefer to use, the staging area. In the index, the version of a file such as README.txt
will start out matching the HEAD
version of that same file. Both files are actually stored as a blob object in the repository.
The work-tree will contain a usable version of README.txt
, representing the expanded version of the file. This is also smudge-filtered and CRLF-adjusted, if you have established such filtering. If you change the work-tree version, and wish to commit the changes, you must run git add README.txt
: this copies the work-tree file back into the index, applying any clean filter and doing the CRLF-to-LF adjustment if you have those enabled, creating a new blob in the repository (or re-using an existing blob if the new file content matches some existing content) and storing the new hash into the index. In effect, this replaces the index copy of the file.
So far so good—but what happens when you have some commit checked out, e.g., as the result of git checkout master
, and you issue the command git checkout develop
? Here, the index takes on its second role, which is to keep track of—i.e., index—the work-tree and to keep cache information about the work-tree. (This is also the source of its third name, the cache.)
Git already translated master
into a commit hash to extract that commit, but at this point it does so again. At this point, Git is using the so-called two tree merge mode of the git read-tree
command. It also translates develop
into a commit hash, so now it has two commit hashes, for master
(current commit) and develop
(desired commit). After making sure that these are indeed commits,1 Git translates them into tree hash IDs: the HEAD
tree, and the desired or target tree.
Meanwhile the index lists the hash ID for each tracked file in the work-tree. In the ideal case, for each file F in the index and/or in the HEAD
commit, F will have the same hash in both HEAD
and index. If so, the index copy of F
is itself "clean" (matches). The work-tree copy may or may not be clean (may or may not match the index copy)—the index's role as cache here helps make this last test very fast in most cases.
For each file F that exists in both HEAD
and the target tree, either the target hash for F matches the HEAD
hash for F
, or it doesn't. For files that do not exist in the target tree, but do exist in HEAD
, either the index and work-tree copy of F are clean, or they're not. If the file is clean, it's safe to remove both copies (if the file is not in the target) or replace both with the version from the target tree (if the file is in the target). But if the target tree hash matches the HEAD
hash, there is no need to touch the index and work-tree entry at all, so Git doesn't.
In short, it's only where HEAD
and target trees don't match that Git needs to change anything in the work-tree to achieve the checkout. If the part that does not match is that file xyz.txt
is in HEAD
but not in the target, the goal becomes remove xyz.txt
—but this is only allowed if it is "clean", unless of course you add --force
to your git checkout
. If the part that does not match is that the file is in neither HEAD
nor the index, but is in the target, the goal becomes to create xyz.txt
with the target's content—but this is only allowed if the file does not exist, or if the file is listed in an ignore directive.2
1Branch names are required to identify commit hashes at all times. (Tag names are permitted to identify other types of objects.) So in theory there is no need to check this. Whether Git really does check, depends on the code path.
2This last part is the source of some serious pain at times. Git really should—but does not—distinguish between "ignore this file because it's easy to re-create" and "ignore this file because it should not be committed, but never clobber it because it contains something precious like user configuration data."
Upvotes: 2