Rock_Artist
Rock_Artist

Reputation: 735

Multiple repos with single submodule

I've looked for a while and didn't find answer (maybe I don't know what to look).

We've got a main library which is a repository by it self (let's call it Lib) it contains most of our modules and submodules. Let's also say it has a size of 2GB...

Now we've got many projects such as: ProjA,ProjB,ProjC each one uses the Lib as submodule.

ProjA

  • Lib (branch:master,commit#:1)

ProjB

  • Lib (branch:other,commit#:2)

ProjA

  • Lib (branch:master,commit#:4)

So while I'm able to keep every project referencing to correct library (aka submodule) version. I've got now 3*2GB = 6GB of THE SAME submodule.

Is there a way to reference to a single submodule while maintaining the correct files/versioning referenced?

Eg.

ProjA

  • Lib/base_lib.h (v1.0)

  • Lib/file_only_in_this_commit

ProjB

  • Lib/base_lib.h (v1.0)

ProjC

  • Lib/base_lib.h (v1.1)

Thanks!

Upvotes: 11

Views: 3257

Answers (2)

yairchu
yairchu

Reputation: 24774

Update:

I've transitioned to using submodule's --reference flag and created a new script, init_submodules to solve the problem using it.

My original/deprecated answer:

You can use git worktree (available since git 2.5) to create additional worktrees for the Lib submodule, at the locations inside ProjA, ProjB, etc.

Because git worktree makes it a pain to make several worktrees with the same name (all are called "Lib"), I just created a script, share_submodules to work around the difficulties and create the additional worktree instead of a submodule, set it to the right submodule commit, and do it recursively for all the submodules inside the shared module.

It should work as well as if the submodule was created by git submodule update --init --recursive, except all copies refer to one module's objects.

If you're transitioning to it by removing the submodule, there are stray submodule files in your .git and I created find_stray_submodules.py to clean them up.

Upvotes: 8

user3159253
user3159253

Reputation: 17455

Well, internally the whole submodule thing is quite simple, so you can master it to your taste.

Inside each of your Proj<N>/.git/modules/ there's a folder corresponding to Lib submodule with bare repository cloned from the remote reference specified in Proj<N>/.gitmodules in Lib.url. Those bare repositories are the points of optimization.

You may simply recreate them using hardlinks where possible.

1) Create a bare clone of your Lib in a folder on the same filesystem as your all Proj repos:

 git clone --bare url://to/Lib /path/to/Lib.git

2) Replace default submodule repo with the repo, referencing the bare repo from p.1:

mv ProjA/.git/modules/Lib ProjA/.git/modules/Lib.old // preserve it for a while
git clone --bare --local url://to/Lib \
    --reference /path/to/Lib.git ProjA/.git/modules/Lib

3) Restore the config from the preserved repo in ProjA/.git/modules/Lib:

cp ProjA/.git/modules/Lib.old/config ProjA/.git/modules/Lib/config

Now you may check if everything works in ProjA and remove ProjA/.git/modules/Lib.old and so on. In this case all repos will use the same fileobjects.

In git a particular state of a submodule is referenced by a precise SHA1. Unless you perform some "evil" operations in you Lib main repo (e.g. git filter-branch or other operations which may lead to deletion of a commit), all proper commits in Lib are kept forever. Your Proj<N> check out particular commits completely independently of each other, so you shouldn't bother that a state of Lib in ProjA may interfere with another state of Lib in ProjB.

Upvotes: 3

Related Questions