that_one_nerdy_guy
that_one_nerdy_guy

Reputation: 115

GitLab Migration file size changes, Why?

I am currently moving many projects from one gitlab server to another. For some reason when using the import git repo by URL, the feature it has when adding new projects, the total file sizes are different from the original, a small difference but still different. I cant give pictures but i will give an example one project that i will call exampleProj is 3.3MB and when i imported it, it became 3.4MB. Another example is exampleProj2 was 2.1MB and the imported one becaome 1.2MB. In all the new imports, they all have the same amount of commits, branches, tags etc. If anyone has any kind of imput that would be very helpful.

Upvotes: 0

Views: 339

Answers (1)

LightBender
LightBender

Reputation: 4263

Nothing is broken

The good news is, as long as all your branches and tags moved, you can be confident that your entire history was migrated intact. (git will definately let you know if all the bits don't add up properly)

Git is just doing it's job

Under the hood, git holds all the objects in an internal database. The contents of this database (along with some metadata) determine the repo size on your GitLab server. Migrating from server to server will always bring over all the objects in the database intact, but git internal optimization does not guarantee the same size on disk:

Sometimes the repo gets bigger

Internally, git stores new objects (zlib compressed) directly in the filesystem, but it is very inefficient to store thousands (or more often, hundreds of thousands or millions) of objects in the filesystem as individual files.

Instead git will periodically "pack" the objects. The two main goals of this packing is space saving and performance.

Because git stores the entire file rather than just a diff, grouping similar files together will allow them to be compressed at a much higher ratio. (This is why a git repo often has roughly the same or smaller footprint compared to an svn repo, in spite of the fact that svn is only storing the diffs for most commits.)

It also organizes the pack files intelligently so it can access them quickly and efficiently. This happens periodically in the background (every few thousand objects), just to keep things fast.

When you migrate a repo, all the pack files will be reorganized using the same process and it can result in minor fluctuations in repo size.

Sometimes the repo gets smaller

Git also makes it very very difficult to actually lose data. Given the default configuration, git will hold on to all objects that are no longer reachable for approximately 90 days before the garbage collector will clean them out. This includes commits and objects on branches that were deleted as well as a fair bit of log data concerning those objects.

Cloning a repo, or migrating it to another server, does not include any unreachable objects. In addition, git will go through a repack on the other end and loose objects that were previously sitting in the filesystem will be placed into pack files and compressed together.

Upvotes: 3

Related Questions