stevendesu
stevendesu

Reputation: 16781

Submodule libraries in git to minimize redundancy

I'm very new to using git, and previously haven't really tried to "organize" any projects I've worked on. I just recently purchased a development server for personal use, however, and I wanted to start organizing all my projects and using version control.

I've spent the past 8 hours researching different recommended methods for organizing files in a project, and I realize that it's a very subjective matter. However I've developed a system that I think will work for just about any cause for me and I have one very objective question in regards to how to accomplish a certain task with the directory structure.

Presently I'm looking into a structure akin to the following:

src/ - All deliverables in an uncompiled form (PHP files, c source files, etc)
data/ - Crucial but unrelated data (SQL databases, etc.)
lib/ - Dependencies -- THIS IS WHERE MY QUESTION LIES
docs/ - Documentation
build/ - Scripts to aide in the build process
test/ - Unit tests
res/ - Not version controlled. Contains PSD files and non-diff-able stuff
.gitignore
README
output.zip - Ready-to-install finished product (just unzip and go)

As I mentioned - my real issue revolves around this lib/ directory. This needs to contain all files and programs which my project requires to run, but which are outside of the scope of my project and I won't be editing. Some features that I need this folder to have:

I can avoid having 18 redundant copies of the same file by using a virtual directory (symlink), however from my understanding git would copy this symlink as-is into the repository without copying the files. Therefore if anyone else fetched my repository they would have a dangling pointer and no libraries.

At first it looked like I could do what I wanted using git-submodule. However from my understanding this takes the entire contents of another repository and treats it as a sub-directory. Therefore if I included "dependency A" my libraries folder would look something like:

/lib/A/src/
/lib/A/data/
...
/lib/A/test/
.gitignore
README
output.zip

In the case of a script (PHP, Perl, etc.) I could probably load the dependency using require('lib/A/src/dependency.php'), but in the case of a DLL or binary file I would have no easy way to read the output file from output.zip. I could have the finished project stored directly at the root level instead of wrapped up in a pretty zip file, but if the project were, say, a website - this could mean hundreds of files cluttering up my repository root.

How can I include another repository as a library of my own, easily reference the library files within my own project, have the library meaningfully copied to anyone who fetches my repository, and prevent redundant copies of the same files on my development server?

EDIT: After searching on Google for a while I found this similar issue, however it only addresses PHP projects. While an autoloader may allow you to mask the underlying file system in a PHP environment, how would you apply a similar approach to a C++ project? Or a Python project? Or a Java project?

As I thought more about this project today a few other thoughts came to mind which may require a new direction of thought. First is the problem of very deep library nests. If project A depends on project B which depends on project C which depends on project D then you would have a directory structure like so:

A/lib/
A/lib/B/
A/lib/B/lib/
A/lib/B/lib/C/
A/lib/B/lib/C/lib/
A/lib/B/lib/C/lib/D/

Obviously this would not only get annoying, but redundant in its own way.

How do normal people deal with dependencies when doing a git repository?

Upvotes: 3

Views: 1502

Answers (4)

wich
wich

Reputation: 17127

Do not embed libraries, this is a security nightmare! When you embed for instance some image format library like libpng, libjpeg or libtiff in your application because you want to use it's image format, you open up your application to any security vulnerabilities those libraries might contain and the user has no easy way of knowing that they need to update your program to resolve the security issue. When you leave the dependency outside the scope of your application then the package manager knows about the library and can take action when security vulnerabilities are exposed.

Leave libraries you depend on outside the scope of your project. If you have personally developed libraries that you use in several projects, put it in it's own repository and make separate releases of it.

For unix like OSes (linux/bsd/solaris/etc.) have users install them separately through their package manager, if you release your software the package manager will know about your dependencies and install the necessary dependencies before it installs your application so no manual actions are necessary.

For Windows use a separate bundling process to bundle the libraries you depend upon into a convenience installer which install the libraries to shared system directories, not your program directory.

There is by the way no technical means in git to do what you want without massive duplication.

Upvotes: 0

Goran
Goran

Reputation: 675

While it is nice to unify workflow you have to respect the beast you're trying to tame. You should have different directory structures for different projects. Working from 3D animation projects to PHP project to C++ projects and everywhere in between I find that squeezing them to conform to the same workflow just adds work and headache in the longrun. Most IDE's have a good "new project" structure straight out of the box, and it is one that other developers will know and understand straight away.

As for the dependency problem try implementing the superproject approach: http://git-scm.com/book/en/Git-Tools-Submodules

Upvotes: 2

Srikanth Venugopalan
Srikanth Venugopalan

Reputation: 9049

In the projects that I have been on, submodules are good only for certain cases when it comes to dependency management, in other cases this is complemented by other framework. Mostly, I prefer to use submodules when I need the complete repository, ex- in case I have a common build script that I can share across projects.

There are specific tools focusing on dependency management in various stack -

etc.

These tools take care of the redundancy management.

Currently, I am on a .net project, where we have this setup -

  1. Powershell build scripts shared across projects using submodules. Buildscript repository contains all 3rd party executables required to deploy any of our .net applications and the respective wrapper powershell scripts, plus some scripts to load the conventions, config etc.
  2. Nuget server (via Teamcity) hosting nuget packages for common binaries shared across projects. Nuget Package restore is a feature that allows fetching packages as part of the build.

Upvotes: 3

Doug Moscrop
Doug Moscrop

Reputation: 4544

You've asked a general question but also asked specifically about a few instances. I'm going to lean towards being more general. The short answer: this is a build system concern, not a version control system concern.

In the case of Java, there are a few different dependency management/resolution tools that you can use. The build system should understand how to fetch those dependencies at build time and make them available. They are, however, transient - you don't check them in to version control. Furthermore, Maven - for example - uses a /target folder that both contains your output (e.g. output.zip - which I'd also recommend because it makes cleaning output easier. What if you have more than one output file? What about variants? etc.) as well as other items such as static analysis output - and it also uses an external directory to locally cache dependencies, but this could be ephemeral and it wouldn't care. Bottom line: it's not persisted in to a version control.

This is not nearly as easy in C++ as far as I know. CMake seems to support building external projects. I've only recently started to play around with this to see what is possible, so I don't want to mislead you by saying "it can easily be done", but it stands to reason that it can be done, the question is only how much work you have to put in to it. So whether or not you call the folder /libs, you should make the build treat dependencies as transitive (and then good luck with transitive dependencies).

Upvotes: 0

Related Questions