Reputation: 4040
I'm the author of GitX. One of the features GitX has is the visualization of branches, as can be seen here.
This visualization is currently done by reading commits which are emitted from git in the correct order. For each commit the parents are known, so it's fairly easy to build up the lanes in the correct way.
I'd like to speed up this process by using my own commit pool and linearizing the commits myself. This allows me to reuse existing loaded commits and allows git to emit commits faster because it doesn't have to emit them in the correct order.
However, I'm not sure what algorithm to use to accomplish this. It is important that the building is incremental, as the loading of commits can take a long time (>5 seconds for 100,000 commits, which should all be displayed).
Gitk has gone the same way, and there's a patch here that shows how it is implemented, but my TCL skills are weak and the patch isn't very thoroughly commented and a bit hard to follow.
I'd also like this algorithm to be efficient, as it'll have to handle hundreds of thousands of commits. It also has to be displayed in a table, so it's important that access to specific rows is fast.
I'll describe the input I have so far, the output that I want and a few observations.
Input:
Output:
A few remarks:
Upvotes: 12
Views: 1358
Reputation: 25710
Do you really need to display 100k commits at once? What kind of user can soak up that kind of info?
Have you thought about paging? I.e just compute for ~100 commits or something. If a branch-line goes way back (off-page), you could use something like Github's back-pointing arrow to show that.
Upvotes: 0
Reputation: 454
OK, so I'm having a similarly hard time reading the entirety of that patch, but let's see if I can piece it together from what I did figure out.
To start with, gitk simplifies things by condensing a string of commits into an arc, containing a series of commits that each only have one parent and one child. Aside from anything else, doing this should cut down pretty dramatically on the number of nodes you have to consider for your sort, which will help out any algorithm you use. As a bonus, related commits will end up grouped together.
This does introduce some complexity in terms of finding an arc when you read a new commit. There are a few situations:
You may want to include the multi-child or multi-parent commits in arcs, or it may make more sense to keep them separate. Either way, it shouldn't be too difficult to build up this set of arcs incrementally.
Once you have these arcs, you're still left with trying to linearize them. In your case, the first algorithm described on the aforementioned Wikipedia page sounds useful, as you have a known set of branch points to use as your initial set S.
Other notes:
Anyway, I hope that helps. It was interesting to think about, at least.
Upvotes: 3
Reputation: 1691
The standard topological sort is O(n) (OK, O(V+E)), i.e. you should be able to sort a million commits in memory in a fraction of a second. No incremental hack like those in Tcl is needed.
BTW, I use GitX (looks much better than Gitk on OS X) everyday and don't have any issue with it (maybe because I don't have those crazy merges in my repositories) :)
Upvotes: 6
Reputation: 16315
I haven't used GitX, so maybe I'm missing something, but it seems like you could walk back from child to parent(s) from the head of each current branch until you can draw a few screens of the graph.
That might not give you the optimal visual layout of branches that are rooted earlier. But it seems like responsiveness would be more important than waiting to draw a graph with the fewest crossings, since most users are likely to be interested in recent activity.
Upvotes: -2