Reputation: 743
git-rebase document says:
If the upstream branch already contains a change you have made (e.g., because you mailed a patch which was applied upstream), then that commit will be skipped.
But how does Git do that?
Assume any commit X
is the parent of commit Y
, and diffXY
is the result of git diff X Y
command. And I have following commits:
o---o---o <- master
\
o---o---o---o <- test <- HEAD
If I do a git rebase master
. Then I guess Git does the skipping commit already have in master
by skipping any commit Y
in test
that diffXY
already have in master
.
I've ran some examples and they were like what I guessed.
This is just my guess, am I right?
Plus, does Git do this skipping task before Git do the reapplying test
's commits onto the master
?
Upvotes: 16
Views: 32960
Reputation: 1324977
The first versions of git rebase
(1.4.4, Oct. 2006) were using git format --ignore-if-in-upstream
This will examine all patches reachable from
<since>
but not from<until>
and compare them with the patches being generated, and any patch that matches is ignored.
So it was looking at the patch ids: See commit 9c6efa3 for the implementation.
if (ignore_if_in_upstream &&
!get_patch_id(commit, &patch_id_opts, sha1) &&
lookup_object(sha1))
continue;
A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a patch, with whitespace and line numbers ignored.
As such, it's "reasonably stable", but at the same time also reasonably unique, i.e., two patches that have the same "patch ID" are almost guaranteed to be the same thing.
That was later delegated to git rebase-am
(Git 1.7.6, Feb. 2011)
And commit b6266dc, Git 2.1.0, Jul. 2014 used --cherry-pick
instead of --ignore-if-in-upstream
When using
git format-patch --ignore-if-in-upstream
we are only allowed to give a single revision range.
In the next commit we will want to add an additional exclusion revision in order to handle fork points correctly, so convertgit-rebase--am
to use a symmetric difference with--cherry-pick --right-only
.
(Further improved in Git 2.18)
That does not change the "skip identical commit" mechanism.
As explained above, "git rebase
"(man) by default skips changes that are equivalent to commits that are already in the history the branch is rebased onto;
But with Git 2.34, this is now clearer, as it gives messages when this happens to let the users be aware of skipped commits, and also teach them how to tell "rebase" to keep duplicated changes.
See commit 767a4ca (30 Aug 2021) by Josh Steadmon (steadmon
).
(Merged by Junio C Hamano -- gitster
-- in commit 6c083b7, 10 Sep 2021)
sequencer
: advise if skipping cherry-picked commitSigned-off-by: Josh Steadmon
Silently skipping commits when rebasing with
--no-reapply-cherry-picks
(currently the default behavior) can cause user confusion.
Issue warnings when this happens, as well as advice on how to preserve the skipped commits.These warnings and advice are displayed only when using the (default) "merge" rebase backend.
Update the
git-rebase
(man) docs to mention the warnings and advice.
git config
now includes in its man page:
skippedCherryPicks
Shown when
git rebase
skips a commit that has already been cherry-picked onto the upstream branch.
git rebase
now includes in its man page:
will be skipped and warnings will be issued (if the
merge
backend is used).
For example, runninggit rebase master
on the following history (in whichA'
andA
introduce the same set of changes, but have different committer information):
git rebase
now includes in its man page:
When using the
merge
backend, warnings will be issued for each dropped commit (unless--quiet
is given).
Advice will also be issued unlessadvice.skippedCherryPicks
is set to false (seegit config
).
So you will now see:
skipped previously applied commit xxx
use --reapply-cherry-picks to include skipped commits
Upvotes: 20
Reputation: 488463
VonC's answer gives the history. The mechanism is what Git calls a patch ID. Git's patch ID concept is documented (albeit a bit lightly) in the git patch-id
manual page, summarizing it this way:
... you can use this thing to look for likely duplicate commits.
This is what git rev-list --cherry-mark
(with the symmetric difference ...
notation) and git format-patch --ignore-if-in-upstream
(with a simple exclusion ..
operation) do to detect duplicate commits. If a commit, whose hash is by definition different from the commit to—at least potentially—be copied, has the same patch ID as the commit to be copied, Git assumes that the commit is already copied and therefore there is no need to copy it.
You also asked:
Plus, does Git do this skipping task before Git do the reapplying
test
's commits onto themaster
?
Yes: the list of commits to be copied is generated first—during which the patch-ID-equivalent commits are discarded, along with all merge commits unless you are using the -p
or -r
options—and then the rebase process begins.
(If you use a non-automated git rebase
that uses git am
, the rebase process still uses git format-patch
output as input to git am
. Otherwise the commit hashes to be copied are stored in a file, or in the sequencer which may or may not store them in a file, and then the commits are cherry-picked, either by running git cherry-pick
or directly by the sequencer. The details depend on your particular Git vintage.)
Upvotes: 9