Élodie Petit
Élodie Petit

Reputation: 5914

Comparing the changes in a tag with a branch

I get the latest tag with git tag and by sorting the output. Now I need to check if the changes in the tag are in an "X" branch.

I'm not looking for a magic command that can do this in a single step. I'm just trying to understand the overall process. For example, should I create a patch and then compare the patch to the target branch (if so, how?), etc.

Upvotes: 1

Views: 820

Answers (1)

torek
torek

Reputation: 487755

TL;DR

There's nothing special to do here. Just use the tag names in the same way you use the branch names.

Long

First, a brief note: Git doesn't store changes. Git stores commits, and each commit is a two-part entity holding:

  • a full snapshot of every file, frozen for all time; and
  • metadata giving information about this commit (also frozen for all time).

The snapshot is stored in a special format, with files compressed and (important internally and for disk space reasons) de-duplicated. The metadata include things like the raw hash ID(s) of previous commit(s), and this is what sets up a Git repository so that the commits are the history.

That said, a branch name like main or production or whatever is simply a way to remember one (1) hash ID. That remembered hash ID is—by definition—the latest commit "on" the branch. Earlier commits that are "on" that same branch are those that are reachable by using the hash ID(s) stored in the metadata to step back one hop, to the parent commit(s), and then from all the parents, to step back one hop, to the grandparent commit(s), and then from the grandparents to step back one hop yet again, and so forth.

The result of all this hop-following is that in a simple chain of commits:

A <-B ... <-G <-H   <--main

the name main selects the entire chain of commits: all eight in this tiny example repository. It means the last of these eight, but it selects the rest of them too, as long as you are using the "with ancestry" option.

No commit contains changes. Each commit has a full snapshot of every file. But, if we have Git compare adjacent commits—such as G vs H—we'll see changes. Git can quickly eliminate exactly-identical files (because they've already been de-duplicated and Git knows that they're identical without even looking at their content), and hence need only compare the changed files, in a game of Spot the Difference, to show what changed. Or, with --name-only or --name-status, git diff hash-of-G hash-of-H will show us just the names (and maybe status code letters) for the differing files.

When we have multiple branch names, there may be shared (common) commits / history, and unshared commits:

          I--J   <-- br1
         /
...--G--H
         \
          K--L   <-- br2

Here commits up through and including H are on both branches, and commits I-J and K-L are only one one branch. Once we switch to one branch br1 and use git merge br2 successfully, however, we get:

          I--J
         /    \
...--G--H      M   <-- br1
         \    /
          K--L   <-- br2

The newly added merge commit M has two parents instead of the usual one, so now commits K-L are on both branches. (Commits I-J are still only on br1, however.) The set of commits contained within a branch changes dynamically over time, with ordinary single-parent commits adding one commit, and merge commits adding any number of commits simultaneously.

Tags are the same as branches

A tag name, in Git, has the same internal representation as a branch name. It is just stored in a different namespace, under refs/tags/ instead of refs/heads. That is, each tag name holds one hash ID. This hash ID can be that of a commit—Git calls this a lightweight tag—or it can be that of a different kind of internal Git object called a tag object or annotated tag object. These are "annotated tags": they still point to commits, normally, but they go through one layer of indirection, and the annotated tag object holds extra metadata, where you can store things like release notes (not really the right place for them) or digital signatures that tell us that this particular commit has someone vouching for it, with a cryptographically secure (we hope) method of verifying that (vs regular commit hash IDs which are theoretically pretty breakable now, though there are practical considerations that make this hard for Git commits).

In other words, a tag name finds a commit (usually: you can tag tree or blob objects, or even other annotated tag objects, but there are few practical uses for this and nobody really does it except as part of the Git self-test suite). So in that sense, it's exactly the same as a branch name. It just lives in a different namespace.

Tags are unlike branch names

There is a key difference between a tag name and a branch name, which we can express in two parts here:

  • Tag names are not supposed to "move", ever. If tag v1.2 stores hash ID H, it should store that same hash ID forever. But branch names are supposed to move and will even move automatically.

  • Checking out a tag (with git checkout or git switch --detach) puts Git into detached HEAD mode. In this mode, new commits you make have no permanent way to find them: as soon as you switch to some branch name, the new commits you made get "lost", unless you choose to save the current hash ID by creating a new branch or tag name.

In other words, branches are for growing—adding new commits—while tags are for permanent record-keeping. There's one other big difference, having to do with git fetch: tag names are, by default, copied from other Git repositories, so that when you git fetch origin you get any of their new tags exactly as is. If they have new or updated branch names, however, your git fetch creates or updates your remote-tracking names, not your branch names.

Hence our two key differences are:

  • tags move and branches don't;
  • tags are global (across clones) and branch names aren't.

Of course, we also run git branch to fiddle with branch names, and git tag to fiddle with tag names. This keeps them separate in "ordinary user" territory. However, the lower-level ("plumbing") commands in Git generally don't make this kind of distinction: we just have to spell out the names in full, such as refs/heads/main to mean branch main and refs/tags/v2.3 to mean tag v2.3.

You're looking for --contains, or technically, is-ancestor

The git branch and git tag front ends both support --contains options, so if you run, e.g.:

git show HEAD~5

and see some commit that you think is important and should be in the release branch, or in release v2.3, you can run:

git branch --contains HEAD~5

or:

git tag --contains HEAD~5

to see if branch name release, or tag name v2.3, shows up here.

These operations really work by using the "is ancestor" test. Remember how, after we made merge M on br1, all the commits in the drawing were contained in branch br1. That's because, by starting at M and working backwards, we "reach" commits J and L, then I and K, then H, then G, and so on. This means that each of these commits is an ancestor of M. (For practical purposes Git also considers a commit its own ancestor, which seems a bit vegetative, but, well, never mind 😀). Git has a plumbing command for testing for ancestry:

git merge-base --is-ancestor <hash1> <hash2>

This tests whether the first given hash ID hash1 is an ancestor of hash2. The result is a shell exit code: 0 (true) means yes, is ancestor and 1 (treated as false by the shell) means no, not ancestor.

Hence, in shell script, a simple:

if git merge-base --is-ancestor $hash1 $hash2; then
    ... code ...
fi

is a simple way to tell if one commit is an ancestor of another. If so, the second hash ID suffices to find the first hash ID, as long as we're doing commit selection with history (as git log or git rev-list would). If $hash2 is derived by finding the hash ID of a branch name, that means commit $hash1 is contained in that branch. If $hash2 is derived by finding the hash ID of a tag name, that means commit $hash1 is in the history of that tag.1

You can, at any point, use the plumbing command git rev-parse to turn a name, like main or refs/heads/main or v2.3 or refs/tags/v2.3, into a raw hash ID. There's a minor stumbling block with tag names here since they may point not to a commit but rather to an annotated tag, so you can use a syntax described in the gitrevisions documentation, where you add ^{commit} to the name:

git rev-parse refs/tags/v2.3^{commit}

for example.

In scripts, it's often wise to use this separate git rev-parse command, as this allows you to catch errors, such as a typo in a name. In more casual use, however, most Git commands accept names—with optional suffixes—where hash IDs would appear. That's true for git merge-base too:

if git merge-base --is-ancestor $hash v2.3^{commit}; then
    echo "commit $hash is in the history of tag v2.3"
fi

For one-off commands, such as one-liner scripts you type in at a bash prompt, this kind of sloppiness is usually fine.


1This points up another way humans tend to use tags and branch names differently. While the names simply select one particular commit, a human asking about a tag is usually asking about just that one commit, while one asking about a branch name is often asking about all history ending at the selected commit. But not always! Silly humans. 😀

Upvotes: 2

Related Questions