Reputation: 2151
As someone relatively new to Git, I have recently (and finally!) understood that a branch is actually just a pointer to a particular commit, and that sometimes it might be better to rephrase "which branch a commit belongs to" as "from which branches is a commit reachable".
For example, the following diagram is from the official Git documentation:
In this image, I would intuitively think that commit C4
"belongs" to the branch master
and commits C3
and C5
belong to iss53
. But what about C0
through C2
? Would they belong to both branches? Or must I say they are "reachable" by branches master
and iss53
?
This gets more complicated once I merge iss53
into master
:
Since branch iss53
was merged into master
, does that make commits C0
through C2
belong to master
"more" than iss53
?
What if I delete branch iss53
after the merge? Which branch would commits C3
and C5
belong to? After thinking about it more, it seems that after the merge, commits C4
, C3
, and C5
are "equal" in terms of the branching history and I can't tell which branch the three of them belong to. This is because after deleting iss53
, there doesn't seem to be any information as to whether C4
belonged to any historical branch any more than C3
and C5
.
I have found this answer which says that it is better to think about this in terms of "from which branches can this commit be reached". But does that mean C4
, C3
, and C5
are all reachable from the master
branch??? But how do you handle the branching parentage that happens in the diagram? Does that matter?
Also, the answer I linked to stated that there could cases where a commit cannot be reached by any branch, how can that happen? And what are its implications?
But my main question remains: How do I associate commits with branches?
P.S. A side/off-topic question that stems from this post would be: Can a commit have more than two parents?
Upvotes: 5
Views: 1552
Reputation: 487755
To add to Greg Burghardt's answer, reachability is indeed the key concept here. The commit graph, complete with the hash IDs and arrows, is the be-all and end-all, as it were. The branch names just give you—and Git—an easy entry point into the graph (but see git gc
in the next paragraph).
The commit graph takes the form of a Directed Acyclic Graph or DAG. The system as a whole requires that a commit be reachable from some external name—a branch name will do, but so will a tag name, or even a Git reflog entry—to keep the commit "live". The maintenance program git gc
will, when asked, scour through the entire commit database, finding commits that are not reachable from any external name, and prune them from the graph. Commits that are reachable from a name, or from a commit that itself is reachable from a name, remain in the graph. Commands that add new commits to the graph often end by running git gc --auto
, which tells git gc
to poke around a bit, guess whether this kind of maintenance is wise at this time, and if so, do a maintenance run.
Other parts of Git will do a graph walk whenever necessary and appropriate. The git log
command, for instance, does one, starting from some given commit(s) and working with the DAG. The graph walk uses a queue (as many graph-walking algorithms do) and keep track of visited commits, so that it can visit each commit once, even if there are multiple ways to get to it.
Upvotes: 3
Reputation: 18783
Commits do not belong to a branch. There is no ownership. A branch is a pointer to a commit. Each commit has one or more parent commits. Tracing back through the history of a branch does not just involve a straight line when multiple branches are merged together. You'll need to reorient your view of commits and branches.
Commits exist in many branches.
Commits can also exist in no branches at all.
Conceptually a Git repository is just a big linked list, where each node points back to at least one other node. A "branch" is just a marker pointing to one of the nodes. A node in Git is called a commit. Deleting a branch in Git just deletes the pointer to the commit, but does not delete the commit object itself. You can recover branches you accidentally deleted, because the database of commits is arranged as a linked list, and a branch is just a pointer — a bookmark, if you will.
But does that mean C4, C3, and C5 are all reachable from the master branch?
Yes, that is precisely what it means. All of those commits are reachable, because commit C6
points to 2 different commits: C5
and C4
.
how do you handle the branching parentage that happens in the diagram? Does that matter?
Commit C6
has two parents. This means two branches were merged together. That's how you handle the "branching parentage." Commits with more than one parent were creating with a git merge
or git pull
(which is a git fetch
followed by a git merge
).
Upvotes: 10