kylawl
kylawl

Reputation: 320

Git pre-push hook, enumerating all un-pushed commits

I want to run a pre-push job on all un-pushed local commits.

git rev-list BRANCH --not --remotes=origin works great for all cases except when the remote you're pushing to is empty. When that is the case, that command will return nothing.

Is it safe to assume that if the remote sha arg is 00000 and git rev-list BRANCH --not --remotes=origin returns empty then all commits are on to be enumerated git rev-list BRANCH?

Is there a better way to get the information I'm after that works in all cases?

Upvotes: 1

Views: 1254

Answers (1)

torek
torek

Reputation: 488193

It's not completely clear to me precisely what you intend to accomplish, but any time you run git push:

  • your git calls up their git (on the remote) and finds out what it has;
  • you tell your git—often implicitly—what branch names (and/or other references) it should look at on your side, and what branch names it should try to push to on "their" side, using "refspecs" (pairs of names with a colon between them).

That is, you may run:

git push origin mybranch:master

or:

git push origin branch1:branch1 branch2:branch2 branch3:newname

or even:

git push origin 'refs/heads/*:refs/heads/*'

You might also run:

git push origin refs/tags/v1.2:refs/tags/v1.2

or (with --tags) include a pair of refs/tags/* rather like the refs/heads/* line.

In other words, you may not be just pushing a branch (you might push several), or you might not be pushing a branch at all, but rather a tag, or you might be pushing branches and tags. (For that matter, there are also "notes". Notes live in refs/notes/, which is a somewhat-new name space that is usually not transferred, but note the word "usually".)

In a pre-push hook, you're supposed to read multiple lines from standard input. There will be one line for every ref-name you're proposing to create, delete, or update on the remote.

On each line, you get (as the documentation notes) the local ref-name,1 the local SHA-1, the remote ref-name, and the remote SHA-1, all in that order. You can tell whether you've asked your git to create or delete the remote ref-name by examining the two SHA-1s. At most one of these will be 40 0s. For a normal update, neither one will be all-zero.

There may be no new commits, or even no new objects at all,2 involved in the supplied ref-name update. For instance, when creating a new tag pointing to an existing commit, there is nothing else to do: you just ask the remote "please create this new tag, pointing to existing commit 1234567890123456789012345678901234567890" or whatever. However, if you're simply removing some commit history (with a forced push), this too has no new commits: you're just asking the remote "please change branch to point to this new ID".

To find out what new objects (if any) would be sent, you should not look at your own names, as these may be out of date. Instead, you should do the same thing as git does: concentrate on the SHA-1 IDs.

There is a bit of a problem here though. Let's say, for instance, that you are asking the remote to update ref-name refs/heads/branch from 1234567... to 9abcdef..., so that the remote SHA-1 is 1234567... and the local SHA-1 is 9abcdef.... This may be—indeed, usually is—a "forward" motion:

... <- 1234567... <- 5555555... <- 9abcdef...   <-- refs/heads/branch

(where the numbers here are SHA-1 IDs of actual commit objects, and you are simply asking the remote to move its branch branch forward two commits). However, it's possible that the remote already has commits 5555555... and 9abcdef..., just not on branch:

... <- 1234567...   <-- branch
                  \
                    5555555... <- 9abcdef...  <-- develop

In this case, while you're updating their branch by moving it forward two commits, those are two commits that were already somewhere in the repository (in fact, on branch develop).

Nonetheless, those are two commits that were not on branch before, and will be afterward, if the push succeeds (your pre-push hook can stop it, but so can the remote: it can run its own hooks and decide to reject your push).

To enumerate those two commits, simply use git rev-list with the raw SHA-1 values, as in this sample hook I found on github.

If you are asking how you can avoid enumerating those two commits, the answer there is that there is no 100% reliable method. You can get fairly close by running git fetch3 before you run git push. This will allow you to find all the ref-names the remote is willing to export to you, and what their SHA-1 values are. Any commit object find-able by their ref-names is necessarily in the remote repository.

Here, git rev-list ... --not --remotes=origin is indeed the mostly4 right thing: after running git fetch to get your copy of their references, you can use the raw SHA-1 to find reachable commits, and also use all of those copies to exclude commits reachable from any remote branch. The flaw here is not just the one in footnote four (tags), but also the fact that no matter how fast your fetch-then-push sequence is, the references you copy may be out of date by the time your push runs. You can make this window very small, but not (with just git alone) eliminate it.


1There is a caveat here, also noted in the documentation: the local SHA-1 may not have a name. This is obviously the case when you're asking the remote to delete a reference, since you request this with git push :ref-to-delete: there's no name on the left-hand side of the refspec. However, it's also true if you push by raw SHA-1 or a relative reference, as in gitrevisions. In general this is not that big a deal since the local ref-name, if any, has no effect on the remote: all the action is due to the two SHA-1s and the remote ref-name.

2Remember, git push pushes all needed objects, not just commits: a commit points to a tree, so if there's a new commit there is probably a new tree; trees point to more trees and to blobs, so there may be additional trees and blobs; and an annotated tag is its own object type. All of these can be transferred during a push.

3You can use git ls-remote to obtain current ref-name mappings, but the problem here is that if your local repository lacks the corresponding object(s), you cannot link these up with your own repository history to find precisely which objects they have that you don't. The only way to find out what they have is to use git fetch to get not just the objects to which those refs point, but also the objects themselves, so as to build the commit graph.

4This, of course, totally omits tags.

Commits on the remote may be reachable through tags. If you bring over their tag name space, however, you (and git) generally do so by copying all those tags into your name space. These tags are not labeled as to their origin, so there is no way to tell if tag v1.2 is your tag, or their tag, or both. If you exclude commits reachable by tags, you may exclude too many commits.

To properly distinguish the remote's tags from your own, or any other remote's, you need to (re)invent "remote tags".

Upvotes: 7

Related Questions