Reputation: 68728

Differences between C++20 module kinds/sizes?

In C++20, for each module M there must be exactly one source file that starts:

export module M;

(This is called the primary module interface unit.)

Optionally, each module M may have additional source files:

(1) Zero or more source files starting:

module M;

(These are called module implementation units.)

(2) Zero or more source files starting (for some unique P):

module M : P;

(These are both module implementation units and module partitions)

(3) Zero or more source files starting (for some unique P):

export module M : P;

(These are both module interface units and module partitions)

So when organizing a codebase to use modules, a decision needs to be made on:

A. How many modules is the codebase split across?

B. For each module, how many source files is the module split into?

C. If the answer to B is more than one, which of the three kinds (1,2 or 3) are the extra files?

What are the tradeoffs in A between using a low number of modules (coarse-grain) and a high number of modules (fine-grain) ? Are there any technical implications to performance? What about functional differences? (Someone suggested "modules work best in big chunks"? Why?)

When would you answer just one to B? And when answering more than one to B, how is the decision C made? ie What are the functional differences between 1, 2 and 3? When would you use one in lieu of another?

Upvotes: 1

Answers (1)

alexpanter

Reputation: 1608

Module implementation and ODR

We can have several module implementation units for a module. Their interpretation is similar to that of a module partition. Example:

// mymodule_interface.cpp
export module mymodule;

export int get_int();

[...]

// mymodule_implementation.cpp
module mymodule;   // implicitly imports the module interface!

int get_int() { return 42; }

[...]

This will be similar to the partition between header/implementation that we are used to. Likely, it will be possible to transform a header into a module implementation unit and keep the exisiting source file as is, adding an export module command at the top. However, I think [personal opinion] that code looks cleaner the fewer files we need to browse, and that distinction between header/implementation causes more confusion than benefit in a modularized approach. E.g., coming from C# this code is likely the easiest to read and maintain:

export module mymodule

export int get_int() { return 42; }

export class myclass {
public:
    myclass(int myint) : mMyInt(myint) {}
    int getInt() { return mMyInt; }

private:
    const int mMyInt;
};

This is [up for debate, certainly!] easier to read and more intuitive. We have one definition = one declaration = one place in code to look.

NOTE: Modules are fairly new, and a best practice has yet to be established by the developer community.

Are there any technical implications to performance?

I cannot speak of performance as I have not tested. Currently, no major compiler vendor supports modules at a scale where it's possible to stress test.

A. How many modules is the codebase split across?

Likely one module declaration for each compilation unit, that is every source file declares a module. The only exception would be internal header files or the main file of the project. If a source file does not declare a module then it needs a corresponding header so its definitions can be used, which eliminates the purpose of modules.

B. For each module, how many source files is the module split into?

This depends. Modules are entirely orthogonal (ie. unrelated concept) to namespaces. Inside a module interface unit we restrict symbols to only be exported when prefixed with export or when placed inside an export { } scope.

(Someone suggested "modules work best in big chunks"? Why?)

This is likely so we don't need to use a lot of import's in consuming source files. In a modularized project every single source file will declare a module, because otherwise we need to include their definitions in a corresponding header file. So instead of writing import A; import B; import C; import D; import E;, we can let modules B-E be module partitions of A: A:B, A:C, etc., and then only have import A; while still importing B-E implicitly.

C. If the answer to B is more than one, which of the three kinds (1,2 or 3) are the extra files?

That depends on whether you want these auxiliary files' contents to have global visibility. Consider this setup:

// module_a.cpp
export module A;
import :B;
[...]

// module_b.cpp
module A:B;
[...]

Everything inside module B will only be visible inside module A. If you want that stuff to be visible outside A, however, then B must export itself and A must export-import B:

// module_a.cpp
export module A;
export import :B;
[...]

// module_b.cpp
export module A:B;
int f(int x) { return x + 1; } // only visible inside A
export g(int x) { return x + 2; } // visible outside A
[...]

Upvotes: 2

Differences between C++20 module kinds/sizes?

Answers (1)

Module implementation and ODR

Are there any technical implications to performance?

A. How many modules is the codebase split across?

B. For each module, how many source files is the module split into?

(Someone suggested "modules work best in big chunks"? Why?)

C. If the answer to B is more than one, which of the three kinds (1,2 or 3) are the extra files?

Related Questions