Andrius
Andrius

Reputation: 21158

git: keep depth=1 on pulls to reduce repo size

Is there a way to make git pull keep depth=1, just use the latest commit, instead of the one that git clone was made (if there is newer one)?

I I do git clone -b some_branch --depth=1 git_repo.git

It clones the repository with minimum space usage because it removes all the history. Now if I need to update that repository again and use git pull, it pulls the whole history.

There is a similar question here:

Pull updates with git after cloned with --depth 1

If try accepted answer advice and use git pull --unshallow and then git pull --depth=1, it looks like it does not reduce space as git clone --depth=1 does.

So the only way to really reduce repository size is to just remove repository and clone with depth=1 again?. Looks kind of clunky way to do it.

And the reason I need this is that there are repositories used when fully cloned, currently take about ~3 GB in size. And there are like 40 environments where it is used. So in total, it uses a lot of space. With a shallow clone, it can be reduced about 5 times.

Sample:

Cloning this repository branch 12.0 [email protected]:odoo/odoo.git, shows its size to be around 3 GB.

Cloning this repository branch 12.0 with depth=1, shows size to be 643 MB.

Using --unshallow on pull and then (as suggested here Converting git repository to shallow?):

git pull --depth 1
git gc --prune=all

Does not seem to shrink size as shallow clone does.

Upvotes: 7

Views: 3993

Answers (2)

Kissaki
Kissaki

Reputation: 9227

git pull has a few parameters related to shallow repositories and fetching (which is part of a pull). The relevant parameter is --depth=<depth>:

Limit fetching to the specified number of commits from the tip of each remote branch history. If fetching to a shallow repository created by git clone with --depth= option (see git-clone1), deepen or shorten the history to the specified number of commits. Tags for the deepened commits are not fetched.

When using git pull with --depth=1 the history will be shortened to 1:

git pull --depth=1

The reason neither this command nor your two-step approach immediately reduces disk space use of the repository is that the commit history is not immediately discarded. The data remains on disk in the .git folder, and list-able and recoverable via git reflog.

To discard the now regularly unreachable data before it expires, git prune can be used with the --expire parameter:

git prune --expire now

Now that the git references are no longer referenced, the data is likely still packed in pack files. As the prune docs state:

It also removes entries from .git/shallow that are not reachable by any ref.

Note that unreachable, packed objects will remain. If this is not desired, see git-repack1.

So a repack is necessary to discard the loose, unreachable, packed objects:

git repack -a -d

Upvotes: 0

Mark Eklund
Mark Eklund

Reputation: 161

Unfortunately, the answer is "no". Depth places commits in the .git/shallow file. When retrieving commit history, your request will stop at the commits in the shallow file, but if there were a merge into the current branch, it will follow that and the whole history behind it. From my blog post, Exploring Git Clone --depth:

If you had a branch structure that you did a git clone --depth=1 when main was at c:

...  -  .  -  .  - [c] -  .  -  .  -  .  -  .  (main)
         \               /
           .  -  .  -  .  (xyz)

And then later did a fetch at g, the merge at d would cause you to pull nearly the whole history (except b).

1000’s of commits  -  a  -  .  - [c] -  d  -  e  -  f  -  g  (main)
                       \               /
                         x  -  y  -  z  (xyz)

The above medium blog post gives some suggestions, but no answer to your question.

Upvotes: 2

Related Questions