Brutus
Brutus

Reputation: 790

Remove history from huge Git repository

I'm currently trying to reduce the size of my Git repository but faced many issues.

Introduction

I have a huge and complex Git repository containing thousands of commits and more than ten branches. It's current size is over 2 GB.

What I want to do

I would like to clean the repository history in order to reduce its size as much as possible. I chose a special commit that I want to be my new root commit (call it <NEW_ROOT>); I want to remove every commit before <NEW_ROOT> and keep all the commits after.

I want to keep only master and, possibly, develop branches, any other branch should be removed from history to reduce size.

At the end of the procedure I want to push everything to remote, so that it only keeps updated master and origin (basically it must reflect my local situation).

What I tried so far

I browsed the web a lot and found many solutions, but none of them worked for me. In particular I guess that such a solution would be perfect in my case, unfortunately I got a lot of conflicts when rebasing.

I also struggled a lot because many solutions I found refers to obsolete and deprecated tools/options (e.g. git filter-branch).

Could you please help me find a way out?

Thanks a lot!

Upvotes: 2

Views: 184

Answers (1)

Enrico Campidoglio
Enrico Campidoglio

Reputation: 59973

This sounds like something you can achieve by doing a shallow clone of your local large repository:

A shallow repository has an incomplete history some of whose commits have parents cauterized away. [...] This is sometimes useful when you are interested only in the recent history of a project even though the real history recorded in the upstream is much larger.

The idea is to shallow clone your local repository into a new directory starting from the commit you deemed to be the new root. Note that this solution assumes that you're only interested in keeping a single branch in the new repository (e.g. master).

The first thing you need to do is create a branch reference that points to the parent of <NEW_ROOT> in the existing repository:

cd your-large-repo
git branch new-root <NEW_ROOT>^

We'll use new-root as the cut off point for the shallow clone. Since we do want to include <NEW_ROOT> in the new repository, we set the cut-off point to its parent. Of course, <NEW_ROOT> must be reachable from master.

At this point, you can go ahead and clone your local repository into a new directory specifying that:

  1. You're only interested in the master branch
  2. You want to exclude all the commits reachable from new-root

Here's the complete command:

git clone --branch master --shallow-exclude=new-root file://C:\path\to\your-large-repo C:\path\to\your-new-repo

The --shallow-exclude option is what tells Git to exclude all commits leading up to and including new-root from the clone.

Now, if you cd into your-new-repo, you'll find that it only contains the master branch and that the root commit is <NEW_ROOT>.

The new repository will have its origin set to file://C:\path\to\your-large-repo. So, before you go any further, you'll have to replace it with the actual URL of the remote repository:

git remote set-url origin https://example.com/your-large-repo.git

At this point, you can simply force push the new history to the remote repository (with the usual caveat on the consequences of force pushing).

Upvotes: 2

Related Questions