Reputation:

revision specification <rev>^{<type>}

Found in git 2.32.0 help, section gitrevisions, cit.:

<rev>^{<type>}, e.g. v0.99.8^{commit}

A suffix ^ followed by an object type name enclosed in brace pair means dereference the object at <rev> recursively until an object of type <type> is found or the object cannot be dereferenced anymore ...

Q. Does the above mean if <rev> is an object yet refers to underlying object the recursive dereferenceing starts not before object <rev> refers to is reached, "means dereference the object at <rev> recursively"? I mean object at <rev> is not <rev> if <rev> is an object too.

Upvotes: 0

Answers (1)

torek

Reputation: 489333

To answer this correctly, we have to start out by defining the term dereference. The idea here is stolen from C-like languages, which have types that are pointer types. But these languages have generalized schemes in which there are base types upon which pointer types are constructed. Git does not behave quite the same way.

Instead, Git has just four object types. This means we can enumerate all four. Before we start, though, let's note that each object has a hash ID as well, which is a cryptographic checksum of the object's content, including its header. The header gives the type of the object, and the size in bytes of the data that make up the object.

One type of Git object is a blob object. A blob object holds raw file data: uninterpreted data bytes. The length comes from the header, so that the data themselves are always a multiple of 8 bits in length.
Tree objects contain formatted data. The internal format of a tree object is not secret, but was somewhat poorly planned, so it's perhaps best to use the expanded / pretty-printed form that git cat-file -p will show. This avoids having to deal with upcoming issues when Git switches to SHA-256 instead of SHA-1.

A tree object consists of some number of data triples: a mode, a path component, and a hash ID. When git cat-file -p encodes a triple into human readable form, it adds a human-readable object type string based on the stored mode. This implies the type of the corresponding hash ID (see below for more about this, though it's not relevant to the answer to your question as asked).
Commit objects contain formatted data. The internal format of a commit object is UTF-8 text, consisting of some set of headers, followed by a blank line, followed by the commit message. The commit message itself should—as in, git commit recommends that users do this—consist itself of a short subject line, then a blank line, then a body, but Git does not enforce this at all: the only enforced part of the format is that Git's own headers come before the first blank line.

The user-supplied commit message may also contain externally-supplied data, such as a GPG-generated digital signature. This is how signed commits work, for instance.

The Git headers come from a restricted set and this matters for the answer to your question as asked. In particular, every commit must have exactly one tree header line. This tree header provides the hash ID of the top level tree object that stores the snapshot for that particular commit.
The last type of object is the tag object, also called an annotated tag object. Tag objects are used by annotated tags; in fact, the meaning of the phrase annotated tag is that we have a tag reference—a ref that begins with refs/tags/—whose stored hash ID is that of a tag object.¹

Tag objects are formatted in a way similar to commits: they have header lines followed by a blank line followed, optionally, by tag message data supplied by the user (and/or by external digital signature providers such as GPG, again).

The header of a tag object must contain exactly one line that gives the hash ID of the target of the tag object. It should also contain a tag header that repeats the tag, though very old tags in Git might not have this.

What all this means for us is this:

Given a tag object, we can follow it to its target object. This could be another tag object, or a commit or tree or blob.
Given a commit object, we can follow it to its (single) tree object.
Given a tree object and a pathname, we could find another object, but without the pathname part, we must stop, as there could be any number of sub-objects within the tree object.
Given a blob object, we must stop, because the bytes of a blob object are not subject to interpretation (at this level anyway).

Hence, the syntax:

hash^{type}

(with a non-empty type part) means:

Find the object referred-to by the given hash. If the object is not found, error: object not found.
If it has the desired type, stop; this is the result.
If it has a type that allows a "follow" step, follow it and start at the top of this sequence of instructions again.
Error: the object has the wrong type.

Using an empty type part means:

Find the object referred-to by the given hash. If not found, error.
If the type of this object is tag, find its target and start at the top of this sequence.
Stop: we have the object.

These three steps result in "peeling off" all annotated tags: if an annotated tag points to a second annotated tag that points to a third annotated tag that points to a blob object, the final result is the blob object. If an annotated tag points to a second annotated tag that points to a commit, the final result is the commit.

That's it—that's all there is to this syntax. The ^{} syntax peels tags: that's all it really does. The ^{type} syntax peels tags or commits to find a commit or tree or blob, unless the type itself is tag, and then asserts that the resulting object has the given type. The practical uses of these are mainly:

to assert that something that must be a commit is a commit, or
to assert that something that must be a tag is a tag, or
to move past all tags if the user provides a tag.

The last of these three uses the ^{} syntax. The ^{tree} and ^{blob} suffixes do work, but have very few practical uses (though you can probably imagine one or two if you think hard enough).

¹Remember that every ref stores exactly one hash ID. Git then opens and reads the corresponding internal object. Some ref types are constrained: refs/heads/ refs—i.e., branch names—must contain the hash ID of a commit object; otherwise the repository is damaged. Tag names may contain the hash ID of any type of object, but it's typical for them to contain either a commit hash ID directly, or an annotated-tag-object hash ID.

Additional notes

The valid modes are limited:

100644 is a read/write file that is not executable. The corresponding object must be a blob object.
100755 is a read/write/executable file. The corresponding object must be a blob object.
004000 is a tree object (a sub-tree stored within a tree). The corresponding object must be a tree object.
120000 is a symbolic link: a blob whose bytes are to be used to create a symbolic link. The corresponding object must be a blob object.
160000 is a gitlink: the hash ID of a commit we expect to find in some other Git repository, used for handling submodules. Because the object may, e.g., be missing, this Git repository is not considered corrupt just because the target object does not exist or has the wrong type. However, if the target object doesn't exist or has the wrong type, this submodule hash can't be used.

No other modes should exist, though very old Git repositories sometimes contain files with mode 100664 (implying group-writable on a Linux file system, but not actually meaning that any more: it's converted to 100644 internally). All Linux mode permissions would theoretically be allowed, but as far as I know, the existing Git repositories that are "grandfathered" (as in git fsck does not reject their mode values) only actually use 100664.

Upvotes: 1

revision specification &lt;rev&gt;^{&lt;type&gt;}

Answers (1)

Additional notes

Related Questions

revision specification <rev>^{<type>}