hpy
hpy

Reputation: 2151

git - How to tell which branch a commit belongs to?

As someone relatively new to Git, I have recently (and finally!) understood that a branch is actually just a pointer to a particular commit, and that sometimes it might be better to rephrase "which branch a commit belongs to" as "from which branches is a commit reachable".

For example, the following diagram is from the official Git documentation:

git tree with two branches: master and iss53

In this image, I would intuitively think that commit C4 "belongs" to the branch master and commits C3 and C5 belong to iss53. But what about C0 through C2? Would they belong to both branches? Or must I say they are "reachable" by branches master and iss53?

This gets more complicated once I merge iss53 into master:

git tree with branch iss53 merged into master

Since branch iss53 was merged into master, does that make commits C0 through C2 belong to master "more" than iss53?

What if I delete branch iss53 after the merge? Which branch would commits C3 and C5 belong to? After thinking about it more, it seems that after the merge, commits C4, C3, and C5 are "equal" in terms of the branching history and I can't tell which branch the three of them belong to. This is because after deleting iss53, there doesn't seem to be any information as to whether C4 belonged to any historical branch any more than C3 and C5.

I have found this answer which says that it is better to think about this in terms of "from which branches can this commit be reached". But does that mean C4, C3, and C5 are all reachable from the master branch??? But how do you handle the branching parentage that happens in the diagram? Does that matter?

Also, the answer I linked to stated that there could cases where a commit cannot be reached by any branch, how can that happen? And what are its implications?

But my main question remains: How do I associate commits with branches?

P.S. A side/off-topic question that stems from this post would be: Can a commit have more than two parents?

Upvotes: 5

Views: 1552

Answers (2)

torek
torek

Reputation: 487755

To add to Greg Burghardt's answer, reachability is indeed the key concept here. The commit graph, complete with the hash IDs and arrows, is the be-all and end-all, as it were. The branch names just give you—and Git—an easy entry point into the graph (but see git gc in the next paragraph).

The commit graph takes the form of a Directed Acyclic Graph or DAG. The system as a whole requires that a commit be reachable from some external name—a branch name will do, but so will a tag name, or even a Git reflog entry—to keep the commit "live". The maintenance program git gc will, when asked, scour through the entire commit database, finding commits that are not reachable from any external name, and prune them from the graph. Commits that are reachable from a name, or from a commit that itself is reachable from a name, remain in the graph. Commands that add new commits to the graph often end by running git gc --auto, which tells git gc to poke around a bit, guess whether this kind of maintenance is wise at this time, and if so, do a maintenance run.

Other parts of Git will do a graph walk whenever necessary and appropriate. The git log command, for instance, does one, starting from some given commit(s) and working with the DAG. The graph walk uses a queue (as many graph-walking algorithms do) and keep track of visited commits, so that it can visit each commit once, even if there are multiple ways to get to it.

Upvotes: 3

Greg Burghardt
Greg Burghardt

Reputation: 18783

Commits do not belong to a branch. There is no ownership. A branch is a pointer to a commit. Each commit has one or more parent commits. Tracing back through the history of a branch does not just involve a straight line when multiple branches are merged together. You'll need to reorient your view of commits and branches.

Commits exist in many branches.

Commits can also exist in no branches at all.

Conceptually a Git repository is just a big linked list, where each node points back to at least one other node. A "branch" is just a marker pointing to one of the nodes. A node in Git is called a commit. Deleting a branch in Git just deletes the pointer to the commit, but does not delete the commit object itself. You can recover branches you accidentally deleted, because the database of commits is arranged as a linked list, and a branch is just a pointer — a bookmark, if you will.

But does that mean C4, C3, and C5 are all reachable from the master branch?

Yes, that is precisely what it means. All of those commits are reachable, because commit C6 points to 2 different commits: C5 and C4.

how do you handle the branching parentage that happens in the diagram? Does that matter?

Commit C6 has two parents. This means two branches were merged together. That's how you handle the "branching parentage." Commits with more than one parent were creating with a git merge or git pull (which is a git fetch followed by a git merge).

Upvotes: 10

Related Questions