SilentDev
SilentDev

Reputation: 22747

What do people mean when they say "ancestor" with regards to git?

I've seen it talked about a lot but when I google'd for the meaning I could not find it. What's an ancestor?

I'm guessing an ancestor of branch x is any parent, grand parent, grand grand parent etc of the branch?

So if branch x branched from master, and branch y branched from x, and branch z branched from y, then z's ancestors are y, x and master? Just taking a guess.

Upvotes: 3

Views: 1073

Answers (2)

torek
torek

Reputation: 488163

TL;DR

The real key to the stuff below lies in realizing that branches—or branch names—aren't where the relationships come in. The relationships between commits come in from the commits. The branch names just get you started in the graph!

Long

Besides what VonC has said, there is a more general description from graph theory. The commits in a Git repository form a Directed Acyclic Graph or DAG.

A graph G is a collection of nodes or vertices V along with a set of edges E that connect between vertices, hence the formulation G = (V, E). A directed graph is one in which the edges are replaced with arcs, i.e., arrows. Here are two Wikipedia images. The first, on the left, shows a regular (undirected) graph. The second shows a directed graph:

                                

When the graph is directed, you can only move from one node to another in the direction given by the arrows. The directed graph above is cyclic because starting at any node, you can move around and wind up back at that node.

An acyclic graph has no cycles, i.e., no matter which node you start from, you can never get back to where you started. The kind of graph that Git commits form is directed and acyclic, hence it is a DAG.

In the Git commit graph, each node or vertex is a commit, uniquely identified by its hash ID. Each arc, along with its direction, is a result of the parent commit hash ID(s) associated with the node. Hence a natural way to draw a Git graph is like this:

... <-F <-G <-H   <-- master

The name master lets you start at the end, i.e., at commit H, and work backwards. The connecting arcs / arrows come out of the node as parent pointers.

An ancestor of some commit is then any commit you can reach by following the arrows. Suppose we have a more complex graph, like this one. I can't draw the internal arrows properly here, but because I put newer nodes toward the right, all the arrows point leftwards—maybe left and up, or left and down, but definitely leftwards. Our graph looks like this:

        D--E
       /    \
A--B--C      H--I--J   <-- master
       \    /
        F--G--K---L   <-- develop

Commit A is a node with no outgoing arcs. In a normal (non-Git) graph, this would make it a leaf node, but Git does everything backwards, so in Git, A is a root node. Commit B points back to A, C points to B, and so on, until we get to H. Commit H is a merge commit: it has two outgoing arcs, pointing to commits E and G both. From H, then, we can walk either the top line or the bottom line—or along both lines—and get back all the way to the root A.

These names, as you might guess, are branch names. They act as entry points to tip commits. From tip commit L we can work back along the bottom row, but we cannot reach commit H—we get to G by going from L to K to G, but G points only to F—so we cannot work our way up to H. All the arrows point backwards. From commit J, though, we can work backwards to H and hence to either row.

This means that commits K and L are only on develop, while commits D, E, H, I, and J are only on master. All other commits are on both branches. That's the reachability notion that VonC is describing.

Mathematically, in a DAG, we define predecessor and successor notions. For predecessor or precedes, we use the bent-less-than ≺ symbol: A ≺ B if you can go from A to B. We use the bent-greater-than ≻ ("successor" or "succeeds") symbol for the opposite. These define a partial order on the nodes in the graph. Of course, a node equals itself, and we can say that given one node, A = A, but also A ≼ A and A ≽ A.

Since Git works backwards, this gets reversed: A ≺ B (A is a predecessor of B) if you can go from B to A! To avoid everyone getting too confused, Git uses is-ancestor instead of precedes. In general, Git also defines is-ancestor to be true if you compare a node to itself, so that all nodes are their own ancestors.

Because Git's internal arrows work exclusively backwards, Git doesn't offer an is-descendant operator. You have to reverse the is-ancestor test (but see below). If you want to know "is node X a descendant of node Y" you can start by asking instead "is Y an ancestor of X". Remember that is-ancestor says "yes" if the two hash IDs match!

Since Git version 1.8.0, the easy way to test if the commits identified by hashes $H1 and $H2 are related by this ancestry notion is:

if git merge-base --is-ancestor $H1 $H2; then
    echo commit $H1 is an ancestor of $H2
else
    echo commit $H1 is not an ancestor of $H2
fi

Note that because ≼ defines only a partial order, the fact that $H1 is not an ancestor of $H2 does not mean that $H2 is necessarily an ancestor of $H1. For instance, in our example above, commits J and L are merely siblings: J is not a parent of L, but L is not a parent of J either. This means that if is-ancestor says "yes", you're done, but if it says "no", you still have to test the reversed case to decide between "is descendant" and "is sibling".

You can also have disjoint subgraphs in a repository. Here you have at least two roots and at least two branch-tips, but there's no connection between the nodes:

A--B--C   <-- branch1

D--E--F   <-- branch2

There's no relationship between any of the top-row commits and any of the bottom-row commits. However, if we add a merge, that makes them become related, in a sense:

A--B--C
       \
        G   <-- branch1
       /
D--E--F   <-- branch2

Now commits A-B-C and D-E-F are "related by marriage", as it were, with the merge at G, because now we can start at G and work backwards to either A or D. Neither A nor D are ancestors of each other, but both are ancestors of G.

Upvotes: 3

VonC
VonC

Reputation: 1324138

The term ancestor was first introduced in April 2005 for merge-base (replacing "parent"), explained one month later:

merge-base finds as good a common ancestor as possible.
Given a selection of equally good common ancestors it should not be relied on to decide in any particular way.

"parent" was seen as an immediate ancestor, while a merge-base algorithm needed to go back further than that.

As mentioned in commit f76412e:

Parent is the first parent of the commit.
We may name it as (n+1)th generation ancestor of the same head_rev as commit is nth generation ancestor of, if that generation number is better than the name it already has.

In August 2005, the first ancestor notation was introduced:

[PATCH] Add a new extended SHA1 syntax ~

The new notation is a short-hand for <name> followed by <num> caret ('^') characters.
E.g. "master~4" is the fourth generation ancestor of the current "master" branch head, following the first parents; same as "master^^^^" but a bit more readable.


z's ancestors are any commit reachable from z HEAD.

The term "reachable" was clarified in 2007:

reachable:

All of the ancestors of a given commit are said to be reachable from that commit.

More generally, one object is reachable from another if we can reach the one from the other by a chain that follows tags to whatever they tag, commits to their parents or trees, and trees to the trees or blobs that they contain.

That lead to revise the notion of distance between two commits:

When a commit has only one relevant parent commit, the number of commits the commit can reach is exactly the number of commits that the parent can reach plus one; instead of running count_distance() on commits that are on straight single strand of pearls, we can just add one to the parents' count.

On the other hand, for a merge commit, because the commits reachable from one parent can be reachable from another parent, you cannot just add the parents' counts up plus one for the commit itself; that would overcount ancestors that are reachable from more than one parents.

Upvotes: 5

Related Questions