gexicide
gexicide

Reputation: 40058

Linker-generated Lists in (probably highly non-standard) C++. Possible?

Say you want to have a class for generating polymorphic objects:

class Product{
     virtual ~Product(){}
}

class ProductFactory {
    virtual const char* getName() = 0;
    virtual std::unique_ptr<Product> createProduct() = 0;
}

Now, say, given a name you want to be able to create a product of that name. So what we need for this, is a data structure, e.g., a list or vector of all product factories in our program. Let's say that all factories will always be global objects, so not dynamically generated but all instantiated at program start-up. Let's further say that all factory constructors are constexpr so we can instantiate them without doing any work at run-time.

In a monolithic code base, we could have a global array of all factories for all products and just iterate over them to get the right one:

// in a .cpp
class AProductFactory {
    constexpr AProductFactory(){...}
    const char* getName() { return "a" };
    std::unique_ptr<Product> createProduct() {...}
}

class BProductFactory {
   constexpr BProductFactory(){...}
   const char* getName() { return "b" };
   std::unique_ptr<Product> createProduct() {...}
}

AProductFactory afac;
BProductFactory bfac;
std::array<ProductFactory*> factories = {afac, bfac};

std::unique_ptr<Product> createProduct(const char* name){
    for (auto f: factories) {
        if (strcmp(f->getName(),name)==0){ 
            return f->createProduct();
        }
    }
    throw; // Shouldn't happen
}

Great so far. This would give us a list of all factories without executing any code at run-time!

But this only works in a monolithic code base where one compilation unit has access to all possible factories. But now, how does this work for a modularized code base, where there could be various compilation units that add new factory types and we want to avoid having a compilation unit that has to know all of them. Instead, we just want to add all factory in all linked compilation units. Of course, we could make that each factory registers itself in a global registry:

class FactoryRegistry{
   std::vector<ProductFactory*> factories;
   void addFactory(ProductFactory* fac){ 
       factories.push_back(fac);
   }
   static FactoryRegistry& getRegistry() {
       static FactoryRegistry fac;
       return fac;
   }
}

// Now, each factory would call getRegistry().add(this); to register itself at program start-up

But the problem with this is that we now have to do a tiny bit of work on program start-up per factory, as each factory has to register itself. If we have a lot of registries and factories in our code base, this could add up and make program start-up slower. C++ is usually great at doing as much work as possible at compile time, but the list is no longer known at compile time now (as no single compilation unit contains all factories), but only at link time. So, is there a way to let the linker do the work and create that list of existing factories at link time?

What we would need is a way for the linker to somehow - for example - create a linked list of all existing factories at link time, so we would have a list of all factories that are in any linked compilation units without having to do any work at runtime.

To understand a bit better what I am thinking about, here is something that comes close to what I want:

struct FactoryChain;

struct ProductFactory {
    FactoryChain* chain;
    constexpr ProductFactory(FactoryChain* chain) : chain(chain){}

    ProductFactory* getNext();
};

struct FactoryChain {
    ProductFactory* next;
};

ProductFactory* ProductFactory::getNext() { return chain->next; }

// In ProductFactory4.cpp
extern FactoryChain factory5;

struct ProductFactory4 : ProductFactory {
    constexpr ProductFactory4(FactoryChain* chain) : ProductFactory(chain){}
};

ProductFactory4 fac{&factory5};
FactoryChain factory4{&fac}; // Factory 3 may use this

https://godbolt.org/z/79MjTc

Now, each factory can get the next factory and each factory only has to know the next factory (by declaring it with an extern declaration), so we no longer need a compilation unit that knows about all factories. Now, the linker will insert the address of factory5, so we effectively get a list of all factories without doing any work at program start-up. Check out the godbolt link, there is really no code generated at program start-up:

ProductFactory::getNext():          # @ProductFactory::getNext()
        movq    (%rdi), %rax
        movq    (%rax), %rax
        retq
fac:
        .quad   factory5

factory4:
        .quad   fac

So my two main requirements (no work at program start-up, no compilation unit that needs to know all factories) are satisfied, but still, each compilation unit declaring a factory needs to know another compilation unit declaring a factory, which gives us quite weird dependencies of unrelated factories.

So the only missing piece is that we still need the extern declaration of the next factory. If we could somehow - with some linker magic - instead make that the linker gives us the address to the next factory without having to declare the name of its variable, we would have what we want. For example (pseudo code, definitly doesn't compile):

extern ProductFactory* factory##<Mr Linker or Mr. Pre-processor, please insert next number here>;

Would something like this be possible, somehow?

Upvotes: 1

Views: 215

Answers (1)

Ben Voigt
Ben Voigt

Reputation: 283624

This sort of thing is certainly possible, but because the C++ standard says nothing about linkers or ordering of global objects, it will rely on some implementation-specific details of your toolchain (That is, likely portable between g++ on Linux and g++ on Windows or g++ on an embedded system, but not between g++ on Windows and Microsoft Visual C++ on Windows).

The basic approach will apply to many different platforms and toolchains:

  1. Create a new "section" with a unique name and the proper attributes for preinitialized data.
  2. For each global object you need in your list, pair it with a global variable of type pointer-to-common-base-class which is initialized with the address of the global object. It should be both const (better yet constexpr) and also external linkage -- beware that const and constexpr both change the defailt linkage; you will have to explicitly mark these pointers extern.
  3. Use the toolchain-specific attribute or pragma that puts that pointer variable in the unique section. The linker will merge sections with the same name in different compilation units.
  4. Where you need to walk the list, declare the linker-generated variables for the unique section location (it may be begin and end pointer or begin and size, depending on your toolchain). reinterpret_cast this pointer to the common-base class and walk through it like an array of pointers to objects which you can use polymorphically.
  5. Be careful not to put anything else in that section besides pointers of the same common base class. If you need to list objects from multiple unrelated class hierarchies, put each in its own uniquely-named section.

On g++, the documentation gives the following example for placement of a global in a specific section:

int init_data __attribute__ ((section ("INITDATA"))) = 0;

With Microsoft Visual C++, the concept is the same but the syntax is quite different:

__declspec(allocate("mysec")) int i = 0;

Upvotes: 1

Related Questions