Radek Anuszewski
Radek Anuszewski

Reputation: 1910

Git - is it possible to add tag and have it as commit? Exactly as in Mercurial

When I was using Mercurial, I used to have a tag as a commit - at least in TortoiseHG it created a commit when tag was added. When I tried to use Git I was disappointed - it didn't create a commit for tag, moreover - when I commit from Intellij IDEA I have to check a checkbox to commit tags to repository. Is it possible to add tag in Git, and have it as a commit, exactly like in Mercurial? Thank you very much for every answer.

Upvotes: 0

Views: 312

Answers (1)

torek
torek

Reputation: 489083

As Don Branson noted in a comment, this seems like an XY problem. But tags in Git and Mercurial have a fundamental difference, because a Mercurial tag is an entry in a file, while a Git tag is a name that resides outside the repository space (though an annotated tag includes a repository object as well).

In particular, Mercurial stores a tag as an entry in a file named .hgtags. The entry gives the tag's name and any other useful data, including the hash ID of the tagged commit. In order to get the updated .hgtags file into the repository, Mercurial must make a new commit once it adds the entry to the .hgtags file.

This has one fundamental (albeit small) drawback: extracting the tagged commit results in a .hgtags file that lacks the tag, so that the tag vanishes. To compensate for this, modern Mercurial will find multiple versions of .hgtags, including one from a commit later than the current commit, so as to locate the correct commit given a tag name.

Git, by contrast, stores a tag as a lightweight name of the form refs/tags/name pointing directly to a commit, or as a lightweight name of the same form pointing to a repository object of type tag, with the repository object holding the hash ID of the tag's target. This means that creating a tag does not disturb existing repository content; there is no repository file containing the tags.

The normal design goal for tags is that tags never change, and never go away either: they act as an append-only storage pool of <name, hash-id> pairs (perhaps with additional data). If tags are actually used this way, the Git and Mercurial methods behave identically except for the quirk that Mercurial requires an extra commit that Git does not.

In practice, people make mistakes: they move tags accidentally, or even on purpose in an attempt to cover up an earlier accident. Hence Mercurial's method has one fundamental audit-trail advantage over Git tags, too: we can, at any commit, see what the tags were as of that commit, even if someone has changed them since then. Git's method has the advantage that we can't see this, and hence cannot become confused by it: there's no way for tag v2.1 to name both commit a93fc12... and e07c1c4..., depending on which commit we have checked out at the moment. (Note that since commits provide only a partial order, not a total order, we may not be able to use "commit order" to choose a definitive winner in the case of ambiguity in Mercurial.)

All that said, if you wish to create a new commit in Git so as to have a unique commit hash to tag, simply create a new commit.1 While Git will by default reject the attempt to commit the same contents as the current commit, git commit has a flag to force such a commit: --allow-empty. (This flag is somewhat misnamed: the new commit is not empty, it's the change-set produced by comparing the new commit to its parent that is empty.) Having made a new commit object, you will have a new, unique hash ID.

It may be better, though, to use the fact that an annotated tag has the lightweight name pointing to a new annotated tag object. That new tag object, just like a new commit, is guaranteed to be unique (see footnote 1 again). You can use the tag object's hash ID the same way you would use a commit ID, as long as the rest of your system is prepared to deal with a tag object and indirect through it to find the commit.


1Note that every commit gets a time stamp (or more precisely, two time stamps, but by default a new commit has them both set to the same value). As long as we presume that time is monotonically increasing, this means that even if a new commit is otherwise a duplicate of an existing commit—i.e., it has the same tree, the same author and committer, the same parent, and the same log message as some existing commit—it will have a different time stamp and hence be a different commit.

When we make commits by some automated machine process, however, we can make many of them per second, and even if time is well-behaved, the granularity of the time stamps is one second. What this means is that in some cases—particularly, when using --allow-empty and making a commit with the same parent as some previous commit, e.g., by having our machine-process change branch names as it makes commits—it's possible to make a "new" commit with exactly the same data as the old commit, so that the purportedly-new commit has the same hash ID as well. In this case, our assumption that each commit hash ID is unique falls apart: it's unique to the commit's internal data, but since the commit's internal data matches a previous commit's internal data, the two commits are the same in a fundamental sense, so they get the same commit hash ID, when we were expecting them to get different hash IDs.

This means that when we write code to make commits, we must examine our assumptions carefully. The same can hold for machine-made annotated tags: we can have the machine make many of these per second, so we should be careful to be sure that their internal data differs, and not just rely on a monotonically increasing time-stamp.

Upvotes: 5

Related Questions