gsf
gsf

Reputation: 7232

How i can filter git log on multiple tags

After I commit to master I run several CI that tag the commit they finished successfully. Say I have two CI - unit tests and integration tests that both create tags on the commit. Say unittest-signoff/{number} and integrationtest-signoff/{number}, where {number} is autoincremental number to assure uniqueness.

if I execute git log -n1 --tags="unittest-signoff" this will give me the most recent commit that is already signed off from unit tests. Same for git log -n1 --tags="integrationtest-signoff"

My question is, what command will give me the most recent commit that has both tags at the same time.

Upvotes: 2

Views: 986

Answers (4)

jingx
jingx

Reputation: 4014

This is what I came up with to do this in pure bash:

comm -12 <(git log --no-walk --tags=unittest-signoff --format="format:%H %ct"|sort) <(git log --no-walk --tags=integrationtest-signoff --format="format:%H %ct"|sort) | sort -k 2 -r | head -1 | cut -d ' ' -f 1

To break it down:

git log --no-walk --tags=unittest-signoff --format="format:%H %ct"

prints the hashes with their commit timestamp. sort because comm needs sorted input. After comm finds the common hashes between the two tag sets, sort them again according to the timestamp, and finally head and cut to get the most recent commit from both tag sets.

[Edit] still need --no-walk

Upvotes: 0

Marco Luzzara
Marco Luzzara

Reputation: 6036

I expected git to have something ready for that, but I did not find it. Nonetheless, I still believe I missed it somehow, maybe using some glob pattern I am not aware of? This is an alternative way to do it:

git rev-list --all | while read cmt
do
    cmt_tags=""
    while read tag
    do
        cmt_tags+="$(echo "$tag" | awk -F'/' '{ print $1; }') "
    done <<< "$(git tag --points-at "$cmt" "integrationtest-signoff/*" "unittest-signoff/*")"
    
    test "$cmt_tags" = "integrationtest-signoff unittest-signoff " && echo "$cmt" && break
done

Assuming you are tagging master branch, I basically looped over each commit until I find the commit with both tags. git tag --points-at returns all the tags in alphabetical order (this is what I observed), but I do not want the part after / so I take only the first token with awk. Thanks to the pattern provided to git tag I am sure it returns only those tags matching the pattern, in the end I just compare the cmt_tags string with the expected one and break as soon as I find a commit.

Not very elegant but simple enough to solve your problem, I would say.


@torek proposed an interesting performance enhancement using git for-each-ref. The first line of the previous script could be replaced with:

git for-each-ref --format="%(committerdate)|%(objectname)" --sort=-committerdate "refs/tags/integrationtest-signoff/*" "refs/tags/unittest-signoff/*" | sort -u -r | awk -F '|' '{ print $2; }' | while read cmt

Now, instead of looping over commits, I only loop over specific-tagged commits. Performances depend on how many commits are tagged with integrationtest-/unittest-, of course.

Upvotes: 1

jthill
jthill

Reputation: 60275

git log lets you choose which decorations to look at, so this should serve:

git log --no-walk --tags  --pretty=%H\ %d --decorate-refs=refs/tags/*-signoff \
| grep integrationtest-signoff | grep -m1 unittest-signoff

Upvotes: 3

torek
torek

Reputation: 488183

[Edit: I may have misread the question. See Marco Luzzara's answer to find a way to see the answer to a different interpretation.]

Consider the --no-walk flag to git log, e.g., git log --no-walk tag1 tag2.

Wait, I was using -n 1, isn't that the same thing?

No. There is a world of difference between --no-walk and -n 1. The -n argument to git log tells it to quit entirely after printing some number of revisions. With -n 1, git log quits as soon as it shows one particular commit.

The way git log works is the key here. When you run:

git log [options] starting-point-1 starting-point-2 starting-point-3

the git log command inserts the three selected starting point commits into a queue (specifically a priority queue, though we won't worry about the priority part here). Try running git rev-parse on a name (branch name, remote-tracking name, or tag name for instance):

$ git rev-parse origin/maint
48bf2fa8bad054d66bd79c6ba903c89c704201f7
$ git rev-parse v2.23.0
cb715685942260375e1eb8153b0768a376e4ece7

These hash IDs—the second one is actually a tag hash ID, rather than a commit hash ID, but git log knows what to do with it—can act as "starting points" for git log. Or, given no starting point, git log uses git rev-parse HEAD, or the equivalent, to find the commit hash ID to insert into this queue, so that the queue has just one commit in it. If you give git log one commit specifier, that's the one commit that goes into the queue.

Once the queue is primed—by your command line starting points, or by git log using HEAD—the real work begins.

The real work of git log, which is run as a loop, over and over

At this point git log starts by taking one commit out of the queue. If there was just the one commit in the queue, the queue is now empty. If the queue was already empty, git log quits now, as there's nothing to take out.

Having taken a commit out of the queue, git log now fishes the commit out of the big database that Git keeps all the commits in. It examines the commit. If you gave git log options, these options may decide whether or not to print the commit. If you gave no options, git log is supposed to print the commit now.

If git log is supposed to print the commit, git log prints the commit now. If there's a -n limit, this decrements the remaining count, and when it's gone to zero, git log quits immediately. Without -n, or if the count is big enough, we keep going.

In any case git log now has the option to put the commit's parent commit(s) into the queue. This option is the default. An ordinary commit has exactly one parent, so for most commits, this puts the one parent into the queue. A merge commit has two or more parents—usually just two—so for merge commits, this puts all the parents into the queue.

This completes the real work. We now go back to the loop, so as to continue working with the queue.

In most cases this produces just what you're used to seeing

Suppose we have a nice simple linear string of commits, ending at the current commit—i.e., HEAD—on the current branch, like this:

... <-F <-G <-H   <-- main (HEAD)

Running git log with no arguments has Git figure out which commit is the current one, which is commit H. The queue has one commit in it.

The log program now extracts the one commit from the queue, which goes empty. That's commit H. It prints the contents of commit H and puts H's parent, G, into the queue. The queue now has one commit in it.

The log program now extracts the one commit from the queue, which goes empty. That's commit G this time, so git log prints the contents of G and puts G's parent, F, into the queue.

This repeats for F, which leads back to another commit, which git log prints, and so on—all the way down the line to the very first commit, which has no parents. At that point git log runs out of queue and stops.

The --no-walk option, part 1

With --no-walk, we instruct git log, in its deal with the commit off the queue step, to put no parents into the queue. If we use this with our bog-standard git log with the current branch being main and the current commit being commit H, what happens is straightforward:

  • git log puts HEAD, i.e., H, into the queue;
  • git log pops H off the queue;
  • git log prints commit H and puts nothing into the queue;
  • and the queue is now empty and git log quits.

Comparing to the -n 1 option, part 1

With -n 1 and one starting point and no restrictions on what gets printed:

  • git log puts HEAD, i.e., H, into the queue;
  • git log pops H off the queue;
  • git log prints commit H and puts its parent G into the queue, but has printed one commit, so it quits.

The output here is the same.

Compare with, e.g., git log --no-walk HEAD HEAD~2

Here we've given git log two commits to put in the queue: HEAD or H, and HEAD~2 or F.

  • git log pops one of the commits—probably H—off the queue;
  • git log prints this one commit, but adds no parents;
  • git log pops the remaining commit—probably F—off the queue;
  • git log prints this commit, but again adds no parents;

and the queue is now empty so we print these two commits and quit.

Would using -n 2 work?

Try it with HEAD HEAD~2. We start with H and F in the queue. Let's further assume that the queue order is such that the newest commit is always printed first (this is the default). So:

  • git log pops H off the queue, and prints it, and puts G on the queue;
  • git log pops G off the queue—it's the newest, compared to F—and prints it;

and those are the two commits it is allowed to print, so it now exits. It never printed commit F at all!

Conclusion: --no-walk is the flag for this purpose

If you want git log to print only the commits you specify on the command line, well, that's exactly what --no-walk is for. Use it for its designed purpose, and you're done.

Upvotes: 0

Related Questions