Reputation: 39336
checkout
) appear to assume that a commit is a snapshot or state of the working tree.rebase
) appear to assume that a commit is a change: a kind of operator that can be applied to working trees.When learning git, this is very confusing.
So what is an appropriate mental model for a Git commit?
Upvotes: 23
Views: 5678
Reputation: 39336
If looking at these answers does not help resolving your confusion about git commits, this is because my original question was not formulated well: It asked "What is a git commit?" instead of asking what I really meant to learn "How should I think about git commits?".
As a result, the answers use different perspectives. So which of them are correct?
This answer is correct for the updated version of the question.
It talks about how you need to apply different mental models for what is a git commit, depending on which git command you are currently thinking about.
If you want to understand how to use git, you will definitely need to have this understanding.
This answer is appropriate for the original version of the question and less so for the updated (and intended) version.
It talks about the technical representation of commits.
If you only want to understand how to use git, this knowledge may or may not be helpful for you:
If you are not keen on learning internals, the dualities answer is fine initially, but be aware that in order to become a Git power user, you will need to learn about the internals eventually; they shine through frequently in the git documentation and many other git explanations.
Upvotes: 0
Reputation: 39336
Short answer: both.
Medium answer: It depends.
Long answer: Git is a bit like quantum phenomena: Neither of the two views alone can explain all observations. Read on.
Internally, Git will use both representations, depending (conceptually) on which one it deems more efficient in terms of storage space and execution time for a given commit at a certain time. The snapshot representation is the primary one.
From the user's point of view, however, it depends on what you do:
Indeed some commands simply only make any sense at all when you
think about commits as snapshots of the working tree.
This is most pronounced for checkout
, but is also true for
stash
and at least halfway for fetch
and reset
.
For other commands, madness is the likely result when you try to think of commits in this manner. For those other commands, commits are clearly treated as changes,
show
, diff
)apply
, cherry-pick
, pull
)rebase
)merge
, cherry-pick
)There is a side-effect of duality 1 that can shock Git newbies accustomed to other versioning systems. It is the fact that Git appears to not even commit itself to its commits.
Huh?
Assume you have created a branch X containing what you like to think
of as your commits A
and B
.
But master
has progressed a little, so you rebase
X to master
.
When you think of A
and B
as changes, but of master
as a snapshot
(hey, both commit models occur in a single operation!),
this is not a problem:
Just apply the changes A
and B
to the snapshot master
.
This thinking is so natural that you will barely notice that Git
has now rewritten your commits A
and B
: They now have different
snapshot content and hence a different SHA-1 ID.
In Git, the conceptual commit that you think of as a developer
is not a fixed-for-all-times kind of thing, but rather
some fluid object that changes as a result of working with your
repository.
In contrast, if you think of all three (A
, B
, and master
)
as snapshots or of all three as changes,
your brain will hurt and you will get nowhere.
The above is a much-simplified description. In Git reality,
And don't get confused by the fact that the Pro Git book's very first characterization of Git (in section "Git Basics") is "Snapshots, Not Differences".
Git is complicated after all.
Upvotes: 12
Reputation: 78690
The answers here are too long.
cherry-pick
, compute differences between snapshots.Upvotes: 2
Reputation: 1324447
While it could be construed as both, the GitHub Engineering team is clear (Dec. 2020):
Commits are snapshots, not diffs
Derrick Stolee starts with
Object ID
The most important part to know about Git objects is that Git references each by its object ID (OID for short), providing a unique name for the object.
We will use thegit rev-parse <ref>
command to discover these OIDs.
Each object is essentially a plain-text file and we can examine its contents using thegit cat-file -p <oid>
command.
Blobs (file content)
To discover the OID for a file at your current revision, run
git rev-parse HEAD:<path>
.
Then, usegit cat-file -p <oid>
to find its contents.
Trees (directory listings)
Note that blobs contain file contents, but not the file names!
The names come from Git’s representation of directories: trees.
A tree is an ordered list of path entries, paired with object types, file modes, and the OID for the object at that path.
Subdirectories are also represented as trees, so trees can point to other trees!
Finally:
commit: snapshot in time
A commit is a snapshot in time. Each commit contains a pointer to its root tree, representing the state of the working directory at that time.
The commit has a list of parent commits corresponding to the previous snapshots.
A commit with no parents is a root commit and a commit with multiple parents is a merge commit.
Commits also contain metadata describing the snapshot such as author and committer (including name, email address, and date) and a commit message.
The commit message is an opportunity for the commit author to describe the purpose of that commit with respect to the parents.Even though commits are snapshots, we frequently look at a commit in a history view or on GitHub as a diff. In fact, the commit message frequently refers to this diff.
The diff is dynamically generated from the snapshot data by comparing the root trees of the commit and its parent. Git can compare any two snapshots in time, not just adjacent commits.
Computing diff is what enable
git cherry-pick
orgit rebase
.
And since commits are not diff...
Git doesn’t track renames. There is no data structure inside Git that stores a record that a rename happened between a commit and its parent.
Instead, Git tries to detect renames during the dynamic diff calculation. There are two stages to this rename detection: exact renames and edit-renames.After first computing a diff, Git inspects the internal model of that diff to discover which paths were added or deleted.
Naturally, a file that was moved from one location to another would appear as a deletion from the first location and an add in the second. Git attempts to match these adds and deletes to create a set of inferred renames.
Upvotes: 11
Reputation: 58578
A commit is a snapshot state. When you do git diff
, it calculates the diff to the parent. This is why there can be multiple parents (the case when there is a merge). Internally, there is delta compression going on, but the versioning model isn't patch-based.
A central concept in git is the index. This is a big object containing the tree of objects being tracked. Changes are staged when they propagate from the working copy to the index; this puts the index into a modified state. The commit operation turns that state into a new commit.
Upvotes: 4