Reputation: 753675
I would like to know whether Git submodules are an appropriate organization for some code that I currently keep under RCS, and if so, how the submodules should be organized.
Suppose I have a collection of library modules (maybe libraries, maybe parts of single library; that's one item up for discussion). Suppose some of those modules are base modules, and other modules depend on the base modules. All of these modules are intended to be used by yet other packaged software (programs), which would presumably include an appropriate selection of these packages as submodules.
To make it concrete, the library modules are:
stderr
— standardized error reporting routines (not dependent on
other modules).filter
— file filter programs (like grep
or cat
): uses
stderr
.debug
— debug trace support: uses stderr
phasedtest
— unit code testing: uses filter
, debug
and
stderr
directly.rational
- a rational number arithmetic package that uses
phasedtest
for its test code, but is independent of phasedtest
and its dependencies otherwise.Many other programs use stderr
.
Quite a lot of those use also use filter
(and all code that uses
filter
also uses stderr
directly), but there are quite a lot of
programs that do use stderr
but don't use filter
.
Some programs use debug
; essentially all those programs also use
stderr
directly, but they may or may not use filter
directly.
Unit test programs using phasedtest
may or may not use stderr
,
filter
and debug
directly (they're more likely to use stderr
than
the others), but phasedtest
itself needs them so such programs always
use those modules indirectly.
Some programs may use rational
; usually they will use stderr
too
(nearly everything written by me uses stderr
), but those programs don't
directly use phasedtest
themselves, in general.
Just for clarification: at the moment, these potential Git modules and
submodules are not in Git at all; most of them have extensive (10-30
year) histories in RCS (SCCS prior to Y2K), which will be preserved when
they are transitioned to Git.
The intention is to get all the repos into GitHub in due course.
In general, these modules are all fairly stable.
They do get revised or extended, but not necessarily every year.
Sometimes, three or more years go by without changes to some of them.
I have a build/distribution system where the files that make up what
might become submodules are pulled into the distribution of the larger
program when that is being prepared for release.
During normal (single-person) development, the material lives in a
library with hundreds of source files built into a single (static)
library (in $HOME/lib
), and a single header directory ($HOME/inc
,
analogous to, but wholly separate from, either /usr/include
or
/usr/local/include
).
I'm seeking to get the structure "right" — sufficiently right that I won't regret what I've done — before transitioning them to Git. I still have version stamping and tagging issues to resolve; that's a whole separate bag'o'worms and not part of this question.
From my understanding of submodules, it appears that:
stderr
should be in its own repository.filter
should be in its own repository with stderr
as a submodule.debug
should be in its own repository with stderr
as a submodule.phasedtest
should be in its own repository with:
debug
as one submodulefilter
as one submodulestderr
as a direct submodule, or
should it use the version of stderr
from the nested submodules
(the stderr
inside debug
and/or the stderr
from inside
filter
)?rational
should be in its own repository with phasedtest
as a
submodule (and whatever sub-submodule organization comes with
phasedtest
).Both filter
and debug
independently need the stderr
submodule
(but they're unlikely to be depend significantly on any particular
version of stderr
-- almost any working version at release level
10 will suffice). So, they both need a version of stderr
in a submodule.
How many libraries: should there be? Options include:
libstderr
, libdebug
,
and libfilter
?libfilter
include the material from stderr
, and
should libdebug
include the material from stderr
(two
libraries)?libjlss
with
elements of stderr
, debug
and filter
in it?Should the phasedtest
code be organized as a fourth library
containing the modules stderr
, filter
and debug
as submodules
(so that stderr
will appear three times, once as a direct
dependency and twice as a dependency of debug
and filter
), or
should it be a smaller library that requires linking with the three
separate dependent libraries?
Since the rational
module only requires phasedtest
for testing,
it won't install the phasedtest
library or libraries.
But it will need them available for testing.
Should it require the pre-installed phasedtest
library (libraries),
or should it be self-contained and have the necessary code for
testing as part of its distribution?
Programs using rational
might also use stderr
(probably would),
but might or might not use debug
and filter
, and would be
unlikely to use phasedtest
except for unit testing their own
components.
Are Git submodules the right way to go, or should I be looking at an alternative organization?
Assuming that Git submodules are appropriate, how would the Git repositories be best organized?
Upvotes: 3
Views: 1250
Reputation: 248
As I see it you have 3 options submodules, subtree or dependencies (static libraries that is pre-built). I've been using submodules a lot recently and that is a way to put git repos inside a git repo and track which commit of the submodules repo your root repo is using. If you need to make changes in the submodules you should use submodules, otherwise go for subtree, or dependencies.
To use dependencies, you need some kind of tool that can pack and resolve the dependencies - a dependency manager. There are some out there, but I haven't found anyone yet that is general, and not nested with a build tool.
Upvotes: 1
Reputation: 311526
Your first two questions ("are git submodules appropriate?" and "how should I organize them?") aren't really a good fit for stackoverflow: the answers are going to mostly be matters of opinion, and it would be hard to identify any single answer as "correct".
Your auxiliary questions are slightly more addressable:
Is there a minimum sensible size for a repository?
Not really, no.
Is there a maximum sensible number of submodules for a single repository?
Again, no, but before creating a monster repository with hundreds of submodules make sure you are familiar working with them first. People have different opinions on how best to manage submodules. Here is one person who has spent some time thinking about. I don't agree with all his ideas, but it is at least a way to start thinking about the issue.
Does it matter if a single submodule is a sub-submodule of a number of of submodules used by a single repository?
Not really, no, although if you have multiple instances of a repository scattered about your sources you are probably going to run into issues of version skew (e.g., one is at version A and another is at version B and another is at version C) unless you are very careful.
Is there a conventional directory structure for submodules? All directories directly in the top-level directory, or some in standard directory name in the root directory, or in quasi-random locations in the superproject directory hierarchy?
There is not, but typically you will pick something that works for you and stick with it. I have seen many projects that place submodules into a lib
or modules
directory, while others do place them at the top-level.
Are there any glaring gotchas that I've not spotted?
Remember that when checked out as a submodule, the current HEAD is managed by the parent repository. That is, if you cd
into a submodule, make changes, push them, and then in the parent project run git submodule update
, you will roll back the local copy of your submodule to whatever commit is recorded in the parent.
It is for this reason that I generally treat submodules as read-only instances of a repository that only ever get updated by running git pull
(followed by a subsequent commit in the parent repository). I only edit files in the standalone checkout of the repository.
You need to train yourself to regularly run git submodule update
after pulling new changes into the parent repository (in case those changes included new versions of your submodules).
Upvotes: 2