Reputation:
Found in git 2.32.0 help, section gitrevisions
, cit.:
<rev>^{<type>}, e.g. v0.99.8^{commit}
A suffix ^ followed by an object type name enclosed in brace pair means dereference the object at <rev> recursively until an object of type <type> is found or the object cannot be dereferenced anymore ...
Q. Does the above mean if <rev> is an object yet refers to underlying object the recursive dereferenceing starts not before object <rev> refers to is reached, "means dereference the object at <rev> recursively"? I mean object at <rev> is not <rev> if <rev> is an object too.
Upvotes: 0
Views: 111
Reputation: 489333
To answer this correctly, we have to start out by defining the term dereference. The idea here is stolen from C-like languages, which have types that are pointer types. But these languages have generalized schemes in which there are base types upon which pointer types are constructed. Git does not behave quite the same way.
Instead, Git has just four object types. This means we can enumerate all four. Before we start, though, let's note that each object has a hash ID as well, which is a cryptographic checksum of the object's content, including its header. The header gives the type of the object, and the size in bytes of the data that make up the object.
One type of Git object is a blob object. A blob object holds raw file data: uninterpreted data bytes. The length comes from the header, so that the data themselves are always a multiple of 8 bits in length.
Tree objects contain formatted data. The internal format of a tree object is not secret, but was somewhat poorly planned, so it's perhaps best to use the expanded / pretty-printed form that git cat-file -p
will show. This avoids having to deal with upcoming issues when Git switches to SHA-256 instead of SHA-1.
A tree object consists of some number of data triples: a mode, a path component, and a hash ID. When git cat-file -p
encodes a triple into human readable form, it adds a human-readable object type string based on the stored mode. This implies the type of the corresponding hash ID (see below for more about this, though it's not relevant to the answer to your question as asked).
Commit objects contain formatted data. The internal format of a commit object is UTF-8 text, consisting of some set of headers, followed by a blank line, followed by the commit message. The commit message itself should—as in, git commit
recommends that users do this—consist itself of a short subject line, then a blank line, then a body, but Git does not enforce this at all: the only enforced part of the format is that Git's own headers come before the first blank line.
The user-supplied commit message may also contain externally-supplied data, such as a GPG-generated digital signature. This is how signed commits work, for instance.
The Git headers come from a restricted set and this matters for the answer to your question as asked. In particular, every commit must have exactly one tree
header line. This tree header provides the hash ID of the top level tree object that stores the snapshot for that particular commit.
The last type of object is the tag object, also called an annotated tag object. Tag objects are used by annotated tags; in fact, the meaning of the phrase annotated tag is that we have a tag reference—a ref that begins with refs/tags/
—whose stored hash ID is that of a tag object.1
Tag objects are formatted in a way similar to commits: they have header lines followed by a blank line followed, optionally, by tag message data supplied by the user (and/or by external digital signature providers such as GPG, again).
The header of a tag object must contain exactly one line that gives the hash ID of the target of the tag object. It should also contain a tag
header that repeats the tag, though very old tags in Git might not have this.
What all this means for us is this:
Given a tag object, we can follow it to its target object. This could be another tag object, or a commit or tree or blob.
Given a commit object, we can follow it to its (single) tree object.
Given a tree object and a pathname, we could find another object, but without the pathname part, we must stop, as there could be any number of sub-objects within the tree object.
Given a blob object, we must stop, because the bytes of a blob object are not subject to interpretation (at this level anyway).
Hence, the syntax:
hash^{type}
(with a non-empty type
part) means:
hash
. If the object is not found, error: object not found.Using an empty type part means:
hash
. If not found, error.These three steps result in "peeling off" all annotated tags: if an annotated tag points to a second annotated tag that points to a third annotated tag that points to a blob object, the final result is the blob object. If an annotated tag points to a second annotated tag that points to a commit, the final result is the commit.
That's it—that's all there is to this syntax. The ^{}
syntax peels tags: that's all it really does. The ^{type}
syntax peels tags or commits to find a commit or tree or blob, unless the type
itself is tag
, and then asserts that the resulting object has the given type. The practical uses of these are mainly:
The last of these three uses the ^{}
syntax. The ^{tree}
and ^{blob}
suffixes do work, but have very few practical uses (though you can probably imagine one or two if you think hard enough).
1Remember that every ref stores exactly one hash ID. Git then opens and reads the corresponding internal object. Some ref types are constrained: refs/heads/
refs—i.e., branch names—must contain the hash ID of a commit object; otherwise the repository is damaged. Tag names may contain the hash ID of any type of object, but it's typical for them to contain either a commit hash ID directly, or an annotated-tag-object hash ID.
The valid mode
s are limited:
No other modes should exist, though very old Git repositories sometimes contain files with mode 100664
(implying group-writable on a Linux file system, but not actually meaning that any more: it's converted to 100644
internally). All Linux mode permissions would theoretically be allowed, but as far as I know, the existing Git repositories that are "grandfathered" (as in git fsck
does not reject their mode
values) only actually use 100664
.
Upvotes: 1