b0ti
b0ti

Reputation: 2329

Tracking revision numbers with git

Subversion has a revision id that is incremented after each commit. We used this to include it in the version number of each release which is in the format X.Y.Z where X is the major version, Y is the minor version and Z is the revision number.

In our issue tracker we would just reference subversion revision numbers (or reference the issue number in the commit message) and it was easy to determine whether a particular version already contained the fix or not.

Now with git commits are identified by a hash. Since this cannot be used as a revision number we use the commit count instead that yields the same thing in order to generate the version number during the build.

Now the problem is that when a bug is reported by a user the bug report normally includes the version number and it is really hard to look up whether this is something that's been fixed in a more recent version or is still unresolved because with git all we see is a commit hash.

One solution would be to maintain a translation table that lists each commit hash and maps it to a revision number but this makes life much harder.

Can you recommend any best practices for this problem?

Upvotes: 3

Views: 1870

Answers (4)

torek
torek

Reputation: 489083

As I said in a comment, the problem here boils down to linearizing. If you want a simple incrementing count to specify some particular commit, you must have a single source point that makes this simple incrementing count.

In SVN, there is an obvious place to do this: all commits are stored on a master central server. In order to make a new commit, you call up the central server and say: make a new commit. This either succeeds—and can get a simple, incrementing number—or it fails and there is no commit.

In Git, there is no designated central server. Each developer makes his or her own commits. Commits are exchanged between peers. The globally unique identifier for any given commit is its hash: Git guarantees that no two commits ever have the same hash.1

The lack of a single central counting point destroys the usefulness of making your own simple revision count, as different repositories can and will have the same number of commits without containing the same set of commits. I may have 17 commits, of which 2 are different from your 17 commits, so that if we combine our two repositories, we both wind up with 19 commits. (If I combine yours with mine, I get 19 commits—two new ones I get from you, plus the 15 we already shared—while you still have 17: you must still pick up the two commits I have that you lack.)

You can, however, use your idea: simply designate a central counting point:

One solution would be to maintain a translation table that lists each commit hash and maps it to a revision number but this makes life much harder.

It's not that much harder if you already have a central server. For instance, if any release build is done on the "release-build" system, and the release-build system has a Git repository, you simply designate its repository as the central counting point.

It maintains the table. The count could be the number of commits in its repository.2 But that's more than we need: The count can simply be the number of entries in the table; there is no need to count non-built releases. In any case, the translation from "count" to "hash", or vice versa, is done by looking up or adding the appropriate entry into the table.

The value of this simplified count is dubious at best. Look at real software releases, which are usually tagged with a "dotted version": Git version 2.8.4, Git version 2.9.0, Git version 2.10.1; Python 2.7.12, Python 3.4.5, and so on. How does 7.3.12 compare to 7.4.0? Is it strictly "less than", or not? With Git, when you build releases, you can tag them with dotted versions like this. The tag can be distributed using Git's built-in mechanisms, and everyone can look up v7.3.12 locally and find the commit. If you do not have the tag, you probably do not have the version: you must git fetch, perhaps with --tags, from someone who does.

The tags are, in effect, a distributed version of this central mapping table. Instead of counting the tags, though, we simply use their names, which have the form vX or vX.Y or whatever.

These tags can be extended with git describe, which lets you say "this many commits distant from this fixed tag, plus a unique verifier/locator in case distributed builds make the relative count break." See Sébastien Dawans' answer.


1This "guarantee" is kept via a simple mechanism: if two commits do have the same hash, Git simply refuses to believe that the second one exists. It won't accept it, it won't store it into the repository, and the existing hash "wins". The chances of this happening for any given pair of objects is vanishingly small: one out of 2N, where N is the number of bits in the hash. Since Git uses SHA-1 which is 160 bits, that's 2-160.

Due to the so-called birthday paradox or birthday problem, the probability rises rapidly with the number of objects. However, we start from such a small base that we can have trillions of objects, perhaps as many as 1.7 quadrillion or so, before the chance even rises to the same level as the chance of undetected storage-media corruption. (The names here use the "short scale"; see https://en.wikipedia.org/wiki/Quadrillion.)

2If you do use this approach (counting the number of commits in its repository), you must make sure you never drop any commits, or the count would go down and hence not act like an ascending function. This is one reason a count of table entries might be better; or you could use a separate counter that you never reset, with an atomic fetch-and-increment when choosing the next number.

Upvotes: 0

Sébastien Dawans
Sébastien Dawans

Reputation: 4626

I handle this in a very simple way using git describe. It conveniently packages 3 important pieces of information:

  1. The hash
  2. The latest Tag
  3. The number of commits since the latest tag, in case we are on an untagged commit.

Furthermore, in most projects I have a standard way of tagging releases: vXXX.YYY.ZZZ. I use the output of git describe everywhere I need an exact reference to a commit. For example, one of my projects is at:

v1.1.9-19-g3024adf

I usually run a pre-compilation script that injects this in some compiler symbols to include in the binary. Having a standard way of naming my tags ensures I get a upper-bound length on the output of git describe, which is important for me because I need to squeeze that in whatever protocol I include in my embedded systems.

Upvotes: 1

Marcus Müller
Marcus Müller

Reputation: 36402

So, there's the conceptual problem that (while SVN makes that possible, it's a lot more handwork) git emphases on different branches being merged.

so let's assume

     /--> B1 --> B2 --> … --> B18-\
A -->                              +--> D
     \--> C1 --> C2 --------------/

What version number should D have? Is it version(A) + 19 (upper path) or version(A) + 3 (lower path)? Or do you count the merge as revision (+1 count)?

So, even in SVN times, your monotonous revisioning was basically but a convention, and you probably didn't really work on branches other than trunk if from that number you could see whether a fix was there or not.

That mono-branched scheme makes no sense for modern development in a team or with a system that allows you to build features without having to fumble with your bugfixes in another branch. So, being but a convention to declare one branch as the "versioned" branch, it's usual to simply have a "master" branch (which is the default branch in git), in which all feature branches are merged as soon as they work, and from which new feature branches are forked off, whenever someone feels like working on a new feature. Then, you'd just git tag commits on your master branch whenever something significant happened – a new release, for example. Typical tag names are release_001_002_001. Yes, it's manual, compared to the automatic revision counting on SVN, but it's unlike that, actually useful for your code management – looking up whether a certain bugfix commit hash happened before or after another commit hash is simply a question of git log.

You can actually just count the commits between A and D. Then, version(D) would be version(A) + 18 + 2 + 1. That's relatively doable; you'd

git log A..D --pretty=oneline | wc -l

Again, I doubt the usefulness of that.

Upvotes: 0

s.m.
s.m.

Reputation: 8043

Don't use the commit count. Simply include the first few characters of the hash in lieu of the old version number. You don't need to include the whole string, the first five or six characters will be enough.

Version numbers don't make sense in a distributed context because the history is eminently not linear. What is commit 10 for you might be an entirely different commit on someone else's clone.

Upvotes: 0

Related Questions