Git pull as rebase except when local tags exists

Question

I have configured git to use rebase whenever I do git pull. However if I have set a tag on head, and there are remote changes, then those remote changes will be pulled then and my local changes will be replayed as new commits on the branch timeline, but my tag will remain on the old commit. The old commit now becomes a leaf-node of its timeline, and it gives a confusing view of the history.

Is it possible (by simple script or preferably by git command) to make 'git pull' do a rebase ONLY when no tags exists on HEAD (or even better no tags on unpushed commits leading to HEAD).

PS: It might be wise on my end to just never set a tag on something I haven't already pushed, as that would solve the problem of a messy timeline, but that is not the solution I had in mind.

torek · Accepted Answer

There's nothing built in, but if you are willing to use git fetch instead of git pull, it's easy to construct:

save current upstream head
fetch from remote
if revs are brought in and/or removed by fetch, test and maybe rebase (else stop, there's nothing to do).

Step 1 is just (assuming¹ the current branch maps to origin/branch in terms of updates):

git update-ref refs/save/origin/branch refs/remotes/origin/branch

You don't actually have to name this refs/save/origin/branch, but obviously having a whole name-space of saved upstreams allows future flexibility. However, it's much simpler (does not require mapping to upstream branch name) to use a fixed name, e.g., ORIG_UPSTREAM (just use that in place of the spelled-out refs/save/ name).

Step 3 requires listing (or at least counting / testing-non-empty) commits that were added or removed. To get a list of revisions, we need git rev-list. We can easily see what's been added and removed with the DAG subset operations:

git rev-list refs/save/origin/branch..origin/branch  # these were added
git rev-list origin/branch..refs/save/origin/branch  # these were removed

We don't need the actual commit IDs here, so we can add --count and just get the two counts. If the sum (or either one) is nonzero, the upstream has changed and you can potentially rebase. (If the upstream has removed revisions you may not want to rebase without some extra care, but I'll ignore that here.)

Now the test-and-maybe-rebase sequence goes like this:

for each commit you have that could be rebased, see if it's tagged
if none are tagged, do the rebase (else merge?).

Here, for step 1, we really do need the list of revisions, but that's easy enough to obtain:

git rev-list refs/save/origin/branch..HEAD

These are the commits you have now that the upstream did not have before the fetch. (You can use origin/branch..HEAD to get the commits you have now that the upstream no longer has, but this may include commits deliberately removed upstream, that you still have copies of. As always you can omit the word HEAD here; I'm using it for emphasis, as it were.)

Now you simply need to see if any of these commit-IDs are the targets of your tags, which you can test using git for-each-ref refs/tags to iterate over your own tags. We must take care to resolve tag references to commits before comparing if you use annotated tags, since you will see the annotated tag object ID here.

This might look something like this (untested):

TF1=$(mktemp -t rbcheck) || exit 1
TF2=$(mktemp -t rbcheck) || { rm -f $TF1; exit 1; }
trap "rm -f $TF1 $TF2" 0 1 2 3 15

git rev-list refs/save/origin/branch.. | sort > $TF1
git for-each-ref refs/tags | while read sha1 objtype tagname; do
    [ $objtype = tag ] && sha1=$(git rev-parse $sha1^{commit}
    echo $sha1
done | sort > $TF2
if [ $(comm -12 $TF1 $TF2 | wc -l) -gt 0 ]; then
    echo there are some tags in the to-be-rebased commits
else
    echo there are no tags, it is safe to rebase
fi

(sorting both files of commit-ID lists might be overkill, if the lists are generally tiny, but will give you O(n log n) behavior if not—and if you're not familiar with comm, it finds lines unique and common to two files, printing them in three columns; using -12 skips the printing of all but the column of common items).

Ultimately, though, I suspect that what you really want is this (harder):

for each commit you have that could be rebased, see if it's tagged; if none are tagged, just rebase (steps 3-4 will be a no-op)
do the rebase; see if it succeeds (if not, next steps must be resumed manually, and stop and get help from user)
pair up each original commit with its rebased equivalent (note: this step is hard in full generality; see below)
for each such commit that was tagged, move the tag to the rebased equivalent.

To implement step 3 here, we need to use git rev-list to obtain the list of "pre-rebase" and "post-rebase" commits. If these lists are the same length, we are in good shape. If the post-rebase list is shorter, some commits were omitted, presumably due to being redundant with the upstream. (If the post-rebase list is longer something has gone wrong: this should not happen, at least not unless you did an interactive rebase and split some commit(s).)

If a commit has vanished, the mapping from "pre" to "post" rebase is no longer 1-to-1 and you must decide how to handle a tag whose "new" commit does not exist. (This part is up to you, you can remap to an ancestor or child, or consider it an error.) In addition, it's not always obvious which commit(s) were omitted. To figure this out in full generality, you can try repeating the rebase process one commit at a time, via git cherry-pick; or you could compare commit message texts, if those are sufficiently unique (this would be much faster).

I'll leave implementing the fancier method to you (or someone else). :-)

¹If your git is not too ancient, it's actually quite easy to get the upstream name of the current branch:

upstream=$(git rev-parse --symbolic-full-name @{u}) || exit 1

This should generally have the form refs/remotes/remote/branch. Note that rev-parse will exit with an error if there is no upstream, hence the || exit 1 here. To turn this into a name like refs/save/, first make sure it starts with refs/remotes/, then replace that:

case $upstream in
refs/remotes/*) save=refs/save/${upstream#refs/remotes/};;
*) fatal "upstream name '$upstream' does not start with refs/remotes/";;
esac

for instance.

Git pull as rebase except when local tags exists

Answers (1)

Related Questions