Reputation: 28901
I have read multiple times that GitHub reduces waste by NOT storing EVERY file in clones, but only the files that change. Source
How does it do this or how can I replicate this type of feature, as I was not able to find this feature in Git?
Note that I don't mind using other VCS that has this functionality.
Upvotes: 0
Views: 293
Reputation: 262714
The Github team has put up a fairly detailed article about their storage layer.
An interesting aspect is that they do not suffer from NIH-syndrome, but based all their work on top of existing functionality in the core git system.
Perhaps it’s surprising that GitHub’s repository-storage tier, DGit, is built using the same technologies. Why not a SAN? A distributed file system? Some other magical cloud technology that abstracts away the problem of storing bits durably?
The answer is simple: it’s fast and it’s robust.
So, to come back to your question:
You can (in core git) configure a shared object storage location to be used together by more than one repository.
Github makes use of this by co-locating forks of repositories on the same server (or set of servers, for redundancy and availability). As a result, any duplicate objects (and there will be many in a fork) will need to be stored just once.
Upvotes: 2