slattery
slattery

Reputation: 1379

Why is my Git repository so much larger than Mercurial version?

I've converted a Mercurial repository to Git, using fast-export. But the Git repository is huge: 18 GB for Git vs. 3.4 GB for Mercurial. None of my cleanup steps have helped.

My Mercurial repository is dominated by one 65 MB file (Anki flashcards in SQLite format) that gets updated daily. Its history has grown to be 2.9 GB, under .hg/store/data.

I was hoping Git might be able to compress the history a little better, but I have been unable to shrink the repository below 18 GB!

I have tried git prune, git gc, and others, to no avail. I even tried zipping the .git folder, and it still came out to be exactly 18 GB.

Am I missing something?

Update: I tried Bazaar (bzr), and it compressed my repository to only 2.3 GB. Nice!

Upvotes: 13

Views: 2092

Answers (3)

Hugo Leote
Hugo Leote

Reputation: 51

Running git gc --aggressive on a repository migrated from Mercurial worked for me. It reduced from 500 MB to 150 MB.

Upvotes: 0

manojlds
manojlds

Reputation: 301327

If the git gc is failing, try manually running a git repack and then git gc.


My observations with SVN, Git and Hg:

I have always observed that SVN and Hg repositories were much smaller than the corresponding git repositories. This is because each change to a file - text or binary, adds a new full object for it. In SVN, only the diff is added, even in the case of binaries and the binary diffing in SVN is very good as well.

But this is where the pack files come in, since they store only diff (delta) amongst similar objects and are even compressed. Even with packing, I have observed that Git repositories, depending on the kind of files and the amount of changes those files undergo, tend to be larger. This is something I have come to accept with Git and it is a compromise I am willing to take given how fast the various operation are with Git.

Upvotes: 7

Omnifarious
Omnifarious

Reputation: 56068

One reason could be that Mercurial has a very compact storage format that involves diffs, even for binaries. And since using diffs to re-create versions can be very time consuming, it will store a full snapshot as soon as the diffs+old original exceed the double the size of a full snapshot.

Personally, I would try storing a dump of your sqlite database instead of the database file itself and see where that gets you. It might be far more efficient.

I do not know what git's storage format is. But I'm guessing it does not involve diffs in the same way as Mercurial's does.

Upvotes: 9

Related Questions