Bill Door
Bill Door

Reputation: 18956

Why is a cloned repo 10x larger than a fetched repo?

We have a remote repo that has exploded in size (1.4G to 14G). We are trying to work out why this repo has increased in size and fix the problem.

In the process we have noticed a significant difference between git clone and git fetch.

If we clone the repo, the cloned repo is also 14G.

git clone <remote>

If instead, we init and fetch from the remote, the local repo is back to the expected 1.4G.

git init
git remote add origin <remote>
git fetch

I think those two sets of commands should be similar if not the same.

This indicates a significant difference between clone and fetch. How are these commands different?

We are looking for ways of determining a fix for the remote.

Note that the remote is on a github enterprise server, so we have limited access to the remote repo.


Some additional statistics

$ git clone git@$REMOTE/main.git .
Initialized empty Git repository in $HOME/cloned/.git/
remote: Counting objects: 439172, done.
remote: Compressing objects: 100% (238472/238472), done.
Receiving objects: 100% (439172/439172), 13.82 GiB | 19.92 MiB/s, done.
remote: Total 439172 (delta 186192), reused 436323 (delta 183501)
Resolving deltas: 100% (186192/186192), done.

$ git fetch
remote: Counting objects: 246663, done.
remote: Compressing objects: 100% (80057/80057), done.
remote: Total 246663 (delta 159364), reused 238800 (delta 153402)
Receiving objects: 100% (246663/246663), 1.13 GiB | 12.25 MiB/s, done.
Resolving deltas: 100% (159364/159364), done.

Those are some pretty different numbers.

Upvotes: 6

Views: 671

Answers (2)

James
James

Reputation: 1804

The difference is that git fetch will only pull all commits from existing branches and tags reachable by those branches in the origin.

However, a git clone will grab all commits reachable by all branches and all existing tags. That is a subtle difference, but what is most likely causing what you see.

You can test this by running git tag in both of your repos and see if they list different tags.

If so, you can do git fetch --tags origin in your fetched repo that is only 1.4 gig, it will pull down all of those additional commits.

To 'fix' this, you can remove any tags that show up in your cloned but not in your fetched repo. Just make sure you really do want to loose that commit history! git tag -d <tagname> and git push :refs/tags/<tagname> for each unwanted tag.

Upvotes: 6

michas
michas

Reputation: 26555

The only difference between the cloned version and the fetched version should be the checkout.

The fetched one should contain only the .git directory, while the cloned one will also have a worktree checked out.

"In theory" git clone -n should do exactly the same as your fetch.

Do you maybe have some sparse files or easy compressible files in your repository? In this case gits objects might be considerably smaller than the file in the worktree.

You could verify by comparing the output of the working tree with that of .git.

Upvotes: 0

Related Questions