user456814
user456814

Reputation:

What are the differences between git clone --shared and --reference?

After reading the documentation, I still don't really understand what the differences are between --shared and --reference <repo>. They seem so similar.

  1. What are the differences between the --shared and --reference <repo> options?

  2. Can they be used to save drive space when making multiple local clones of another local clone?

  3. Can each local clone have a different branch checked-out?

Note: I'm aware that I can use multiple shallow clones with truncated history by using git clone --depth <depth>, but each clone still has to duplicate at least some history in order to do that, so I was thinking that maybe it's not the most optimal way to save drive space (though it is better than nothing).

Background

Sometimes I like to have more than one checkout of my working copy in a repository, so I create multiple clones, where each clone has its own checkout.

However, I don't really need the whole history with each clone, just the most up-to-date versions of my branches, so I could possibly save a lot of drive space by having each clone use the tag, commit, tree, and blob objects from the original local clone (for example, via symlinks for something).

git clone documentation

I checked the git clone documentation to see if there's anything I can use.

--shared

I saw that there's a --shared option:

When the repository to clone is on the local machine, instead of using hard links, automatically setup .git/objects/info/alternates to share the objects with the source repository. The resulting repository starts out without any object of its own.

This looks like it might be useful for helping me to save drive space with multiple clones that have different checkouts, since each clone shares objects with the original local clone.

--reference <repository>

Then I also saw the --reference <repository> option:

If the reference repository is on the local machine, automatically setup .git/objects/info/alternates to obtain objects from the reference repository. Using an already existing repository as an alternate will require fewer objects to be copied from the repository being cloned, reducing network and local storage costs.

NOTE: see the NOTE for the --shared option.

This says that it will reduce local storage costs, so this might be useful as well.

Upvotes: 43

Views: 20786

Answers (3)

Paul Van Camp
Paul Van Camp

Reputation: 261

The link in the comments to your question is now dead.

https://www.oreilly.com/library/view/git-pocket-guide/9781449327507/ch06.html has some great information on the subject. Here is some of what is there:

first, we make a bare clone of the remote repository, to be shared locally as a reference repository (hence named “refrep”):
$ git clone --bare http://foo/bar.git refrep

Then, we clone the remote again, but this time giving refrep as a reference:
$ git clone --reference refrep http://foo/bar.git

The key difference between this and the --shared option is that you are still tracking the remote repository, not the refrep clone. When you pull, you still contact http://foo/, but you don’t need to wait for it to send any objects that are already stored locally in refrep; when you push, you are updating the branches and other refs of the foo repository directly.

Of course, as soon as you and others start pushing new commits, the reference repository will become out of date, and you’ll start to lose some of the benefit. Periodically, you can run git fetch --all in refrep to pull in any new objects. A single reference repository can be a cache for the objects of any number of others; just add them as remotes in the reference:

$ git remote add zeus http://olympus/zeus.git
$ git fetch --all zeus

Upvotes: 3

Sam Brightman
Sam Brightman

Reputation: 2950

The link in the comments to your question is really a clearer answer: --reference implies --shared. The point of --reference is to optimise network I/O during the initial clone of a remote repository.

Contrary to the answer above, I find that the --shared and --reference repositories -- from the same source -- have the same size and both have zero objects. Of course, if you use --reference for some other repository which is based off a common source, the size and objects will reflect the difference between the repositories. Note that in both cases we are not saving space in the work tree, only the .git/objects.

There is some nuance to maintaining this setup going forward - read the thread for more details. Essentially it sounds like the two should be treated as public repositories, with care around history re-writing in the presence of repacking/pruning/garbage collection.

The workflow around maintaining an optimal disk-space usage after the initial clone seems to be:

  1. pull source
  2. repack source
  3. pull secondary
  4. git gc in secondary

Probably best to read the discussion in that thread though.

You can add an alternate to an existing repository by putting the absolute path to the source's objects directory into secondary/.git/objects/info/alternates and running git gc (many people use git repack -a -d -l, which is done by git gc).

You can remove an alternate by running git repack -a -d (no -l) in the secondary and then removing the line from the alternates file. As described in the thread, it is possible to have more than one alternate.

I've not used this much myself, so I don't know how error-prone it is to manage.

Upvotes: 8

DoubleWord
DoubleWord

Reputation: 157

Both options update .git/objects/info/alternates to point to the source repository, which could be dangerous hence the warning note is present on both options in documentation.

The --shared option does not copy the objects into the clone. This is the main difference.

The --reference uses an additional repository parameter. Using --reference still copies the objects into destination during the clone, however you are specifying objects be copied from an existing source when they are already available in the reference repository. This can reduce network time and IO from the source repository by passing the path to a repository on a faster/local device using --reference

See for yourself

Create a --shared clone and a --reference clone. Count the objects in each using git count-objects -v. You'll notice the shared clone has no objects, and the reference clone has the same number of objects as the source. Further, notice the size difference of each in your file system. If you were to move the source, and test git log in both shared and reference repositories, the log is unavailable in the shared clone, but works fine in the reference clone.

Upvotes: 14

Related Questions