Reputation: 28490

How to organize public and private libraries in a cabal project?

I have a project with several modules,

2 are for exporting a public library (1 each, say A and B; of the two, B a kind of the back-end, and is used by A, but I'm exporting both because I think one could use B for another different project);
the other are just internal modules.

(And there's a non-module file with main :: IO () that correspond to an executable stanza, but in reality at some point I'll delete it in favour of a test, because what I'm really using it for is to check that the B exported entities do what I expect.)

If I want

each file to be one module and correspond to one library,
each library have the minimum number of build-depends, i.e. only those implied by the imports in their respective file,
each library to be imported wihtout duplicate strings in the .-separated string at import Foo.Bar.Baz,

what options do I have, in terms of directory organization?

Initially each module had

its own directory,
its own library stanza in the .cabal file,
and so its own build-depends list,

but then I've starting experimenting a bit with the hierarchical structure of the project, and the implications on dependencies and import syntax to be used.

At the moment I have a tree structure of the project like this:

.
|
+--- lib/
|    |
|    +--- A/
|    |    |
|    |    +--- A.hs
|    |    |
|    |    +--- Internal/
|    |               |
|    |               +--- E.hs
|    +--- B/
|    |    |
|    |    +--- B.hs
|    |
|    +--- C/
|    |    |
|    |    +--- C.hs
|    |
|    +--- D/
|    |    |
|    |    +--- D.hs
|
+--- exe/
|    |
|    +--- Main.hs
|
+--- myproject.cabal

I've put E.hs in A/Internal precisely for experimenting.

Let me explain.

E.hs is truly an implementation detail of A.hs, so in a way it does make sense for it to be somewhere in a directory named Internal. It's also nice that I can import it via

import A.Internal.E

The modules A and B can be imported via

import A
import B

which makes sense, being those modules exported.

Now say that C is an implementation detail of B. I would put it there:

.
|
+--- lib/
|    |
|    +--- A/
|    |    |
|    |    +--- A.hs
|    |    |
|    |    +--- Internal/
|    |               |
|    |               +--- E.hs
|    +--- B/
|    |    |
|    |    +--- B.hs
|    |    |
|    |    +--- Internal/
|    |               |
|    |               +--- C.hs
|    +--- D/
|         |
|         +--- D.hs
|
+--- exe/
|    |
|    +--- Main.hs
|
+--- myproject.cabal

So far each module is still in its directory, so they can each have its own build-depends.

But as soon as I decide that, say, D is too, an implementation detail of A, I run into this,

.
|
+--- lib/
|    |
|    +--- A/
|    |    |
|    |    +--- A.hs
|    |    |
|    |    +--- Internal/
|    |               |
|    |               +--- E.hs
|    |               |
|    |               +--- D.hs
...

where E.hs and D.hs are in the same directory, hence, if I understand correctly, can't export 2 different libraries, each with its build-depends.

I could definitely add another directory layer,

.
|
+--- lib/
|    |
|    +--- A/
|    |    |
|    |    +--- A.hs
|    |    |
|    |    +--- Internal/
|    |               |
|    |               +--- E.hs
|    |               |
|    |               +--- E/
|    |               |    |
|    |               |    +--- E.hs
|    |               |
|    |               +--- D.hs
|    |               |
|    |               +--- D/
|    |                    |
|    |                    +--- D.hs
...

but this means that to import E and D I need to write

import A.Internal.D.D
import A.Internal.E.E

where the duplication of the last bit is very ugly.

Upvotes: 3

Answers (2)

Ben

Reputation: 71545

I think you are mixing up some steps.

The hs-source-dirs only sets the base directory (or directories) that the compiler will use (when compiling that target) to translate Haskell module names into filesystem paths. If you want to import a module using import A.Internal.E then that module will be translated into the path A/Internal/E.hs, and the compiler will check for that relative path under each of the folders listed in hs-source-dirs. If such a module isn't there, then it must be an exposed-module of one of the libraries listed in the build-depends of the target being compiled, otherwise you'll just get a module-not-found error.

So you could put everything in a single library (with hs-source-dirs: lib-A) and use this folder structure:

lib-A/
   A.hs
   A/
     Internal/
       D.hs
       E.hs

Or you could put E and D in their own libraries (where library A has hs-source-dirs: lib-A, and library-D has hs-source-dirs: lib-D, etc), and use this folder structure:

lib-A/
  A.hs

lib-D/
  A/
    Internal/
      D.hs

lib-E/
  A/
    Internal/
      E.hs

In this case your cabal stanza for library A needs to include your sub-libraries D and E in its own build-depends.

Either way you use the exact same syntax to import E: import A.Internal.E. Your Haskell code doesn't care at all about the structure of libraries and packages, only about the module namespace. The module namespace that is seen by your Haskell code is assembled from packages and libraries, and you use your .cabal file to specify that, but there are many possible structures of libraries that will lead to the exact same module namespace being visible in A.hs.

The structure where you were concerned about repetition in the names looks like this:

lib-A/
   A.hs
   A/
     Internal/
       D/
         D.hs
       E/
         E.hs

This makes no sense. What gives you the ability to have separate build-depends is making them separate libraries. If you have library D with hs-source-dirs: lib-A/A/Internal/D, then you cannot import the module D as import A.Internal.D.D; that import statement would be converted to the relative path A/Internal/D/D.hs and when you look for that in the hs-source-dirs you would be checking the path A/Internal/D/A/Internal/D/D.hs, which doesn't exist. This would compile because in A the import A.Internal.D is taken as referring to a non-exposed module as part of library A, rather than the exposed module of library D.

If you had actually tried this you would find that D simply wouldn't build. A would only build if you had forgotten to list D in the build-depends for A.

If you want to import A.Internal.D and you want that to come from a separate library, then you need to put D in a separate top-level folder, with the A/Internal path included in its own separate folder tree. If you want to point the hs-source-dirs of D at a small folder that just contains D.hs, then you need to import it as import D, regardless of whether the hs-source-dirs folder is itself inside a path with Internal or A as path components.

There is actually nothing stopping you from using the same hs-source-dirs for different libraries (or putting the hs-source-dirs folder for one library inside that of another) and putting everything in the one folder. But things become very confusing when you do that; each library is trying to expose a different subset of the same folder, but when you're looking at the code you're looking at the one combined mess of files (not the structure declared in your .cabal file). It's very easy to accidentally have a file compiled again as part of one target when it was supposed to be re-used via a dependency on a different target (as I suspect must have happened if your import A.Internal.D.D attempt ever compiled at all). Using exactly one top-level folder per Cabal target is much more straightforward. (Though if you have lots of libraries in a package it might make sense to put them inside an organised tree of folders; they don't all have to be at the same level as you .cabal file, or even at the same level as each other so long as none of them are inside another library's hs-source-dirs folder).

As an aside, the sort of structure you're proposing where every module is its own library is a very unusual way to do things. I'm not sure why you feel the need for every module to have its own minimal build-depends.

If someone depends on one of your exposed libraries, they necessarily will have to install the transitive dependencies of that library. That means that if you have exposed modules A, B, and then internal non-exposed modules D and E used by A, anyone who installs A is going to have to install everything listed in the build-depends of the libraries for A, D, and E. If D and E are truly internal implementation details of A (enough for you to literally want to call them A.Internal.D and A.Internal.E), then nothing else is depending on D and E, which means nothing ever depends on only one of them. Every downstream client will need to install all 3 of A, D, and E, or none of them.

In that case, there is no reason to have D and E in separate libraries to have their own build-depends. They can just be bundled within the library (and thus folder) for A. Separating them out only makes more work for yourself writing cabal stanzas and creating folder trees, makes the code more difficult to browse, makes builds take longer, and probably even risks the compiled code being less efficient¹.

The only reason I can think of to do this is for internal modules that aren't internal implementation details of one particular library, but are shared by several of your exposed libraries and not by all of them. In that case you might actually reduce the total transitive dependencies of each of your public libraries by splitting up your shared internal modules into separate libraries. But even then, I would probably try to identify "clusters" of modules that have similar dependencies, and have just a small number of internal libraries. (In fact usually my packages would have a core concept or theme that means almost all of the dependencies are shared between identifiable sub-sections of my code anyway, and only a handful of dependencies might be avoided by one of my public libraries by doing something like this; though if one of that handful is something like lens that has a huge transitive closure it could still be worthwhile!)

¹ I don't know for sure that there are cross-module optimisations that GHC can do within one target that it can't do across targets, but it certainly wouldn't hurt to combine them. The usual way of compiling libraries would be required to exactly preserve the API of the exposed-modules, since they need to be as-stated in order to be consumed by other depending targets (which might be compiled without the full source code of this library available). For modules it knows are not exposed, it could decide to do transformations that would change the API (perhaps if it notices everything is inlined it will omit the module entirely).

For modules that are exposed in a private library it could theoretically do the same things, but it would have to be able to look at the source code of the other targets that consume this library, which I don't believe it does. Perhaps there is no issue, but it is extremely plausible to me that there are (or could be in future) optimisations that are blocked by splitting up every module into its own library.

Upvotes: 1

Li-yao Xia

Reputation: 33519

If you want the module D to be its own library while being under A in the module hierarchy, you can have an A directory under D:

lib/
  A/
    A.hs
    A/Internal/E.hs
  D/
    A/Internal/D.hs

library A
  hs-source-dirs: lib/A
  exposed-modules: A A.Internal.E

library D
  hs-source-dirs: lib/D
  exposed-modules: A.Internal.D

Upvotes: 1

How to organize public and private libraries in a cabal project?

Answers (2)

Related Questions