Jonathan Leffler
Jonathan Leffler

Reputation: 753675

Organizing Code Into Git SubModules

I would like to know whether Git submodules are an appropriate organization for some code that I currently keep under RCS, and if so, how the submodules should be organized.

General outline of modules

Suppose I have a collection of library modules (maybe libraries, maybe parts of single library; that's one item up for discussion). Suppose some of those modules are base modules, and other modules depend on the base modules. All of these modules are intended to be used by yet other packaged software (programs), which would presumably include an appropriate selection of these packages as submodules.

To make it concrete, the library modules are:

Many other programs use stderr. Quite a lot of those use also use filter (and all code that uses filter also uses stderr directly), but there are quite a lot of programs that do use stderr but don't use filter. Some programs use debug; essentially all those programs also use stderr directly, but they may or may not use filter directly. Unit test programs using phasedtest may or may not use stderr, filter and debug directly (they're more likely to use stderr than the others), but phasedtest itself needs them so such programs always use those modules indirectly. Some programs may use rational; usually they will use stderr too (nearly everything written by me uses stderr), but those programs don't directly use phasedtest themselves, in general.

Just for clarification: at the moment, these potential Git modules and submodules are not in Git at all; most of them have extensive (10-30 year) histories in RCS (SCCS prior to Y2K), which will be preserved when they are transitioned to Git. The intention is to get all the repos into GitHub in due course. In general, these modules are all fairly stable. They do get revised or extended, but not necessarily every year. Sometimes, three or more years go by without changes to some of them. I have a build/distribution system where the files that make up what might become submodules are pulled into the distribution of the larger program when that is being prepared for release. During normal (single-person) development, the material lives in a library with hundreds of source files built into a single (static) library (in $HOME/lib), and a single header directory ($HOME/inc, analogous to, but wholly separate from, either /usr/include or /usr/local/include).

I'm seeking to get the structure "right" — sufficiently right that I won't regret what I've done — before transitioning them to Git. I still have version stamping and tagging issues to resolve; that's a whole separate bag'o'worms and not part of this question.

How should submodules be organized?

From my understanding of submodules, it appears that:

Issues arising

  1. Both filter and debug independently need the stderr submodule (but they're unlikely to be depend significantly on any particular version of stderr -- almost any working version at release level 10 will suffice). So, they both need a version of stderr in a submodule.

  2. How many libraries: should there be? Options include:

    • Should there be three separate libraries: libstderr, libdebug, and libfilter?
    • Or should libfilter include the material from stderr, and should libdebug include the material from stderr (two libraries)?
    • Or should there be a single composite library libjlss with elements of stderr, debug and filter in it?
    • Does the answer vary if the libraries are shared rather than static?
  3. Should the phasedtest code be organized as a fourth library containing the modules stderr, filter and debug as submodules (so that stderr will appear three times, once as a direct dependency and twice as a dependency of debug and filter), or should it be a smaller library that requires linking with the three separate dependent libraries?

  4. Since the rational module only requires phasedtest for testing, it won't install the phasedtest library or libraries. But it will need them available for testing. Should it require the pre-installed phasedtest library (libraries), or should it be self-contained and have the necessary code for testing as part of its distribution?

  5. Programs using rational might also use stderr (probably would), but might or might not use debug and filter, and would be unlikely to use phasedtest except for unit testing their own components.

Main questions

Auxilliary questions

Upvotes: 3

Views: 1250

Answers (2)

Jan Lovstrand
Jan Lovstrand

Reputation: 248

As I see it you have 3 options submodules, subtree or dependencies (static libraries that is pre-built). I've been using submodules a lot recently and that is a way to put git repos inside a git repo and track which commit of the submodules repo your root repo is using. If you need to make changes in the submodules you should use submodules, otherwise go for subtree, or dependencies.

To use dependencies, you need some kind of tool that can pack and resolve the dependencies - a dependency manager. There are some out there, but I haven't found anyone yet that is general, and not nested with a build tool.

Upvotes: 1

larsks
larsks

Reputation: 311526

Your first two questions ("are git submodules appropriate?" and "how should I organize them?") aren't really a good fit for stackoverflow: the answers are going to mostly be matters of opinion, and it would be hard to identify any single answer as "correct".

Your auxiliary questions are slightly more addressable:

Is there a minimum sensible size for a repository?

Not really, no.

Is there a maximum sensible number of submodules for a single repository?

Again, no, but before creating a monster repository with hundreds of submodules make sure you are familiar working with them first. People have different opinions on how best to manage submodules. Here is one person who has spent some time thinking about. I don't agree with all his ideas, but it is at least a way to start thinking about the issue.

Does it matter if a single submodule is a sub-submodule of a number of of submodules used by a single repository?

Not really, no, although if you have multiple instances of a repository scattered about your sources you are probably going to run into issues of version skew (e.g., one is at version A and another is at version B and another is at version C) unless you are very careful.

Is there a conventional directory structure for submodules? All directories directly in the top-level directory, or some in standard directory name in the root directory, or in quasi-random locations in the superproject directory hierarchy?

There is not, but typically you will pick something that works for you and stick with it. I have seen many projects that place submodules into a lib or modules directory, while others do place them at the top-level.

Are there any glaring gotchas that I've not spotted?

Remember that when checked out as a submodule, the current HEAD is managed by the parent repository. That is, if you cd into a submodule, make changes, push them, and then in the parent project run git submodule update, you will roll back the local copy of your submodule to whatever commit is recorded in the parent.

It is for this reason that I generally treat submodules as read-only instances of a repository that only ever get updated by running git pull (followed by a subsequent commit in the parent repository). I only edit files in the standalone checkout of the repository.

You need to train yourself to regularly run git submodule update after pulling new changes into the parent repository (in case those changes included new versions of your submodules).

Upvotes: 2

Related Questions