Reputation: 1260
Explanations of squashing usually say the process incorporates the changes of a commit into previous commit(s), resulting in a single commit.
However, I am very confused about what this is actually supposed to mean, because commits do not represent deltas but complete versions of the project.
Let's say I have four commits
A <-- B <-- C <-- D
and I check out D
and interactively rebase on A
, squashing B
, C
, and D
into a single commit BCD
.
The result is:
A <-- BCD
My question is how the tree above is different from
A <-- D
because no matter what examples I tried the working directory of BCD
always looked like just D
. I would be grateful for an example where A <-- BCD
and A <-- D
differ.
CLARIFICATION
It seems my question caused some confusion, so here is an alternative wording:
If I opened commit D
in the .git/objects folder with an editor and changed the parent pointer from C
to A
, and then delete the commits B
and C
from the .git/objects folder, I would get
A <-- D
Question: is the tree above identical to A <-- BCD
, the tree I get by squashing B
, C
and D
in an interactive rebase?
(And in case the trees are identical, would I have arrived at yet the same result by picking "drop" for B
and C
in the interactive rebase instead of using squashing?)
Upvotes: 0
Views: 133
Reputation: 489203
As matt said in his answer, the trick here is that most of the commands we use for managing the commits wind up turning the snapshots into change-sets. When they deal with commits as change-sets they really do have to do this sort of replaying, even though the smart way (as noted below) wouldn't.
Remember that every commit has a unique hash ID, which will only ever refer to that particular commit. No part of any commit can ever be changed, including the back-pointer to its parent: if we try to change any part of any commit, we end up with a new, different commit, with a new, unique hash ID. So, given the original sequence:
A <-B <-C <-D <--branchname
we're going to end up with:
B <-C <-D [abandoned]
/
A <-E <--branchname
no matter what else happens. We can have E
be the equivalent of the squash of BCD
, or we can have E
be different in some way: perhaps we retain D
's tree but use a different commit message than any of B
, C
, or D
; and of course E
points directly back to A
, which is unlike C
and D
.
Using git rebase -i
and replacing two pick
s with squash
es uses a somewhat inefficient method of arriving at exactly that situation: it builds up, as a temporary commit that gets shoved aside, a combined BC
commit with a combined message, and then readies (but doesn't quite commit yet) a combined BCD
commit—one whose tree will match that of commit D
—and a combined message. It then invokes your editor on the combined message.
If we replace the two pick
s with fixup
s, Git still builds the combined BC
commit but uses B
's message, then makes as our E
commit the combined BCD
commit using B
's message.
The efficient way to handle this would be to make a single E
commit that uses D
's tree and a message. Git could be taught to do this, but that's a special case that just falls out of the easier add-one-at-a-time method. (It's possible that since the rewrite of rebase into C, Git actually has been taught to do this—I have not looked into the inner workings lately; the last time I did, the shell script based rebase definitely made separate commits.)
You can also run git merge --squash
, which ends up doing this more cleverly, but to do that you need to assign a branch name to point to commit A
:
A <-- branch1 (HEAD)
\
B--C--D <-- branch2
With branch1
checked out as shown, running git merge --squash branch2 && git commit
will produce:
A---------E <-- branch1 (HEAD)
\
B--C--D <-- branch2
without the excess compute-work that the rebase method might use. But computer time is usually pretty cheap, and this requires more human-time to set up the multiple branch names. (You need the && git commit
because --squash
always turns on --no-commit
.)
Compare the squash-merge result to a regular merge:
A---------E <-- branch1 (HEAD)
\ /
B--C--D <-- branch2
The difference is that a regular merge records two parents for new commit E
, and a squash-merge doesn't. The lack of extra compute work occurs because there is no commit after A
; Git realizes that moving from A
to D
is a fast forward operation, i.e., that it means just use D
's snapshot. Had we started with:
A--E--F <-- branch1 (HEAD)
\
B--C--D <-- branch2
there would have been real work to do, for either squash-merge or real-merge, and Git would have done that work to produce new merge, or non-merge, commit G
:
A--E--F---G <-- branch1 (HEAD)
\ ?
B--C--D <-- branch2
(where ?
means that there's an arrow back from G
to D
for the real-merge case, but not for the squash-merge case).
Upvotes: 4
Reputation: 535617
commits do not represent deltas but complete versions of the project
Yes, but diffs / patches do represent deltas. Squashing patches (replays) the diff A to D directly onto A and commits the result.
Merge, cherry pick, interactive rebase all involve diffs. Even though git does not store diffs, it uses diffs all the time.
because no matter what examples I tried the working directory of BCD always looked like just D
Yes, and that’s exactly what’s desired and intended. The idea of squashing is to keep the same outcome but change the underlying commit history, e.g. two commits instead of four.
Upvotes: 0