cnst
cnst

Reputation: 27218

git: a fast web-interface for huge repos

Some git repositories come in really huge sizes: DragonFly BSD .git directory is 324MB, and FreeBSD's is above 0.5GB in packed size, and above 2GB in unpacked size.

Does Gitweb, cgit or any other web-tools do any kind of pre-caching for these huge repositories?

How can one estimate the optimal amount of resources (e.g. memory and CPU constraints) for a web-interface to a couple of such huge repositories? What would be the response time for a blame or a log operation of a random file?

Upvotes: 1

Views: 517

Answers (2)

VonC
VonC

Reputation: 1324787

(Update June 2017, 4 years later)
Actually, those repos are tiny.
Microsoft Windows code base is a huge repo: 3.5 million files and is over 270 GB in size.
And Git manages it well... with the addition of GVFS (Git Virtual File System), announced last February 2017: that solves most of Git scaling issues (too many pushes, branches, history and files).

And the commands remain reasonably fast (source: "The largest Git repo on the planet")

https://msdnshared.blob.core.windows.net/media/2017/05/Performance.png

For context, if we tried this with “vanilla Git”, before we started our work, many of the commands would take 30 minutes up to hours and a few would never complete.

See more with "Beyond GVFS: more details on optimizing Git for large repositories"

This is not yet available in native Git, but the Microsoft team is working to bring patches upstream.

Upvotes: 1

mvp
mvp

Reputation: 116197

Thanks to git object store model, git repository size is not really an issue for gitweb and similar tools (btw, 500MB repo size is rather small - Linux kernel is close to 1GB now, Android frameworks/base is few gigabytes).

This is because gitweb does not need to pull whole repository to show you tree - it can always look at just few objects: commit objects to show commits, tree objects to display directories, and blob objects to show you files.

The only operation that might slow down gitweb is displaying history for single file, but this does not happen very often, and even then git is pretty good at coping with that without much trouble.

As far as gitweb speed concerned, best optimization you could make is to run gitweb (which is Perl script) under mod_perl, such that Perl interpreter is loaded into memory just once. This alone will make gitweb fly, and git operations will be almost not noticeable.

Upvotes: 1

Related Questions