Phuc
Phuc

Reputation: 743

How does `git rebase` skip the commit which its change already has in upstream?

git-rebase document says:

If the upstream branch already contains a change you have made (e.g., because you mailed a patch which was applied upstream), then that commit will be skipped.

But how does Git do that?

Assume any commit X is the parent of commit Y, and diffXY is the result of git diff X Y command. And I have following commits:

o---o---o        <- master
 \
  o---o---o---o  <- test <- HEAD

If I do a git rebase master. Then I guess Git does the skipping commit already have in master by skipping any commit Y in test that diffXY already have in master.

I've ran some examples and they were like what I guessed.

This is just my guess, am I right?

Plus, does Git do this skipping task before Git do the reapplying test's commits onto the master?

Upvotes: 16

Views: 32960

Answers (2)

VonC
VonC

Reputation: 1324977

The first versions of git rebase (1.4.4, Oct. 2006) were using git format --ignore-if-in-upstream

This will examine all patches reachable from <since> but not from <until> and compare them with the patches being generated, and any patch that matches is ignored.

So it was looking at the patch ids: See commit 9c6efa3 for the implementation.

 if (ignore_if_in_upstream &&
    !get_patch_id(commit, &patch_id_opts, sha1) &&
     lookup_object(sha1))
     continue;

A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a patch, with whitespace and line numbers ignored.
As such, it's "reasonably stable", but at the same time also reasonably unique, i.e., two patches that have the same "patch ID" are almost guaranteed to be the same thing.

That was later delegated to git rebase-am (Git 1.7.6, Feb. 2011)

And commit b6266dc, Git 2.1.0, Jul. 2014 used --cherry-pick instead of --ignore-if-in-upstream

When using git format-patch --ignore-if-in-upstream we are only allowed to give a single revision range.
In the next commit we will want to add an additional exclusion revision in order to handle fork points correctly, so convert git-rebase--am to use a symmetric difference with --cherry-pick --right-only.

(Further improved in Git 2.18)

That does not change the "skip identical commit" mechanism.


As explained above, "git rebase"(man) by default skips changes that are equivalent to commits that are already in the history the branch is rebased onto;

But with Git 2.34, this is now clearer, as it gives messages when this happens to let the users be aware of skipped commits, and also teach them how to tell "rebase" to keep duplicated changes.

See commit 767a4ca (30 Aug 2021) by Josh Steadmon (steadmon).
(Merged by Junio C Hamano -- gitster -- in commit 6c083b7, 10 Sep 2021)

sequencer: advise if skipping cherry-picked commit

Signed-off-by: Josh Steadmon

Silently skipping commits when rebasing with --no-reapply-cherry-picks (currently the default behavior) can cause user confusion.
Issue warnings when this happens, as well as advice on how to preserve the skipped commits.

These warnings and advice are displayed only when using the (default) "merge" rebase backend.

Update the git-rebase(man) docs to mention the warnings and advice.

git config now includes in its man page:

skippedCherryPicks

Shown when git rebase skips a commit that has already been cherry-picked onto the upstream branch.

git rebase now includes in its man page:

will be skipped and warnings will be issued (if the merge backend is used).
For example, running git rebase master on the following history (in which A' and A introduce the same set of changes, but have different committer information):

git rebase now includes in its man page:

When using the merge backend, warnings will be issued for each dropped commit (unless --quiet is given).
Advice will also be issued unless advice.skippedCherryPicks is set to false (see git config).

So you will now see:

skipped previously applied commit xxx
use --reapply-cherry-picks to include skipped commits

Upvotes: 20

torek
torek

Reputation: 488463

VonC's answer gives the history. The mechanism is what Git calls a patch ID. Git's patch ID concept is documented (albeit a bit lightly) in the git patch-id manual page, summarizing it this way:

... you can use this thing to look for likely duplicate commits.

This is what git rev-list --cherry-mark (with the symmetric difference ... notation) and git format-patch --ignore-if-in-upstream (with a simple exclusion .. operation) do to detect duplicate commits. If a commit, whose hash is by definition different from the commit to—at least potentially—be copied, has the same patch ID as the commit to be copied, Git assumes that the commit is already copied and therefore there is no need to copy it.

You also asked:

Plus, does Git do this skipping task before Git do the reapplying test's commits onto the master?

Yes: the list of commits to be copied is generated first—during which the patch-ID-equivalent commits are discarded, along with all merge commits unless you are using the -p or -r options—and then the rebase process begins.

(If you use a non-automated git rebase that uses git am, the rebase process still uses git format-patch output as input to git am. Otherwise the commit hashes to be copied are stored in a file, or in the sequencer which may or may not store them in a file, and then the commits are cherry-picked, either by running git cherry-pick or directly by the sequencer. The details depend on your particular Git vintage.)

Upvotes: 9

Related Questions