Yahya Uddin
Yahya Uddin

Reputation: 28901

How does GitHub reduce storage waste?

I have read multiple times that GitHub reduces waste by NOT storing EVERY file in clones, but only the files that change. Source

How does it do this or how can I replicate this type of feature, as I was not able to find this feature in Git?

Note that I don't mind using other VCS that has this functionality.

Upvotes: 0

Views: 293

Answers (1)

Thilo
Thilo

Reputation: 262714

The Github team has put up a fairly detailed article about their storage layer.

An interesting aspect is that they do not suffer from NIH-syndrome, but based all their work on top of existing functionality in the core git system.

Perhaps it’s surprising that GitHub’s repository-storage tier, DGit, is built using the same technologies. Why not a SAN? A distributed file system? Some other magical cloud technology that abstracts away the problem of storing bits durably?

The answer is simple: it’s fast and it’s robust.

So, to come back to your question:

You can (in core git) configure a shared object storage location to be used together by more than one repository.

Github makes use of this by co-locating forks of repositories on the same server (or set of servers, for redundancy and availability). As a result, any duplicate objects (and there will be many in a fork) will need to be stored just once.

Upvotes: 2

Related Questions