Reputation: 1379
I've converted a Mercurial repository to Git, using fast-export. But the Git repository is huge: 18 GB for Git vs. 3.4 GB for Mercurial. None of my cleanup steps have helped.
My Mercurial repository is dominated by one 65 MB file (Anki flashcards in SQLite format) that gets updated daily. Its history has grown to be 2.9 GB, under .hg/store/data.
I was hoping Git might be able to compress the history a little better, but I have been unable to shrink the repository below 18 GB!
I have tried git prune
, git gc
, and others, to no avail. I even tried zipping the .git folder, and it still came out to be exactly 18 GB.
Am I missing something?
Update: I tried Bazaar (bzr), and it compressed my repository to only 2.3 GB. Nice!
Upvotes: 13
Views: 2092
Reputation: 51
Running git gc --aggressive
on a repository migrated from Mercurial worked for me. It reduced from 500 MB to 150 MB.
Upvotes: 0
Reputation: 301327
If the git gc
is failing, try manually running a git repack
and then git gc
.
My observations with SVN, Git and Hg:
I have always observed that SVN and Hg repositories were much smaller than the corresponding git repositories. This is because each change to a file - text or binary, adds a new full object for it. In SVN, only the diff is added, even in the case of binaries and the binary diffing in SVN is very good as well.
But this is where the pack files come in, since they store only diff (delta) amongst similar objects and are even compressed. Even with packing, I have observed that Git repositories, depending on the kind of files and the amount of changes those files undergo, tend to be larger. This is something I have come to accept with Git and it is a compromise I am willing to take given how fast the various operation are with Git.
Upvotes: 7
Reputation: 56068
One reason could be that Mercurial has a very compact storage format that involves diffs, even for binaries. And since using diffs to re-create versions can be very time consuming, it will store a full snapshot as soon as the diffs+old original exceed the double the size of a full snapshot.
Personally, I would try storing a dump of your sqlite database instead of the database file itself and see where that gets you. It might be far more efficient.
I do not know what git's storage format is. But I'm guessing it does not involve diffs in the same way as Mercurial's does.
Upvotes: 9