Reputation: 52343
Let's say commit A1 is a parent of commit A2. What does it really tell me?
To clarify my question, here are two incorrect interpretations:
1) Commit A2 was created based on commit A1 in the sense that the user checked out A1, made a few edits, and committed A2 (without any intervening git commands). This is wrong due to rebasing.
2) Each git commit stores the delta relative to its parent, so you have to follow the arrows in reverse direction and apply each delta to reconstruct the contents of a commit. This is wrong because unlike many other VCS, git commits store complete snapshots rather than deltas.
Here's an example of an interpretation that seems almost right, but is very vague:
3) Commit A2 incorporates all the work represented by commit A1 plus some additional work. "Work" is used in the simple sense of adding, deleting and editing files.
Upvotes: 0
Views: 998
Reputation: 489848
Interpretation 2 is outright wrong, but it contains one correct item: you do (or Git does) have to follow the backwards-arrows that Git stores, in order to construct the graph. Each commit "points to" its parent commits (by storing their true-name hash IDs), making each commit act as a single vertex (or node) plus a set of outgoing arcs that, once collected up, form a directed acyclic graph or DAG. In most diagrams in CS or informatics, we'd have the outgoing arcs go from parents to children, but in Git the arrows are all backwards. (This is so that parents do not need to know their child IDs before the children exist, while also allowing the parent commits to be read-only once created. Since each hash ID is determined solely by each object's contents, and they are deliberately difficult to compute, no hash ID can be known until the contents are known. The parent commits therefore must be read-only: you cannot update them to add their children; that would change their hash IDs.1)
Interpretation 1 is mostly correct, but is missing some key items. As Jim Deville said in his answer, Git's various plumbing commands allow you to construct nearly-arbitrary commit graph nodes (i.e., commit objects). The command git commit-tree
in particular takes any number of valid parent commit IDs (-p
options), one valid tree ID, and a commit message, and constructs a new commit from these, using your configuration and your computer's idea of the current time to set the author and committer name, email, and timestamp fields (or using the environment variable overrides if they are set). The new commit object is stored in the database with nothing pointing to it, so you must quickly2 set a reference (such as a branch or tag name) to retain it. (Or, you can create another commit to retain the just-created commit, but then that commit requires either a name, or another commit which requires something, and so on.)
This means that the parent information is up to the command that creates the commit.
When you use git rebase
, the step that creates the new commit is usually—or might as well be—git commit
itself, and git commit
sets the new commit's parent based on the result of reading HEAD
(and then immediately updates HEAD
or, more normally, the branch that HEAD
names). A rebase operation generally works with a "detached HEAD", where HEAD
contains the raw hash ID of an existing commit, instead of the more normal case of HEAD
containing a branch-name.
Hence, rebase works by detaching HEAD
so that it points to the --onto
target (which defaults to the <upstream>
argument), then making commits, one at a time. It makes each new commit by converting the original commit into a delta, applying the delta to the current index-and-work-tree, and making a commit a la git commit
. (The actual mechanics of rebase are implemented using either git cherry-pick
or git am
, both of which are written in C and use the code from git commit
. However, an interactive rebase may, in some cases, such as for squash steps or when using --root
, literally run git commit
rather than, or in addition to, running git cherry-pick
. A --preserve-merge
rebase uses the interactive machinery and literally runs git merge
to create new merges. The details get fairly complicated.)
Note that the conversion, from snapshot to changeset / delta, is done by running git diff
against the commit's recorded parent. Hence, setting a weird parent ID is not useful. You can do it (with git commit-tree
) but unless you will never cherry-pick or rebase or git show
the commit, all of which use the parent ID to change snapshot to delta, this would be poor planning.
1One could, of course, split each commit object into a read-only portion that participates in the hashing, and a read/write portion that does not. That would allow Git to add child IDs to parents. But this would make Git less stable and less secure: read-only objects tend not to get corrupted as much as read/write objects, and having part of a commit not participate in its hash would mean that that part was not protected by the hash either.
2By default, git gc --auto
, which other Git commands run from time to time, gives you two weeks to finish this task. If it takes you longer than that, an automatic git gc
may prune away your as-yet-unreferenced commit.
Upvotes: 2
Reputation: 10672
I would say that all A1 being a parent of A2 means is that in the git tree-ish for the given branch, A1 was the immediate commit before A2.
I'm not certain, but I believe you could use git plumbing to write commits and trees directly and thus make a commit that has absolutely no relation to the previous commit. However, even in that case, it will act as if the step between the two was deleting all files and adding the new ones.
Upvotes: 1