Reputation: 11716
Background: I'm trying to implement a system like that described in this previous answer. In short, I have an application that links against a shared library (on Linux at present). I would like that shared library to switch between multiple implementations at runtime (for instance, based on whether the host CPU supports a certain instruction set).
In its simplest case, I have three distinct shared library files:
libtest.so
: This is the "vanilla" version of the library that will be used as a fallback case.libtest_variant.so
: This is the "optimized" variant of the library that I would like to select at runtime if the CPU supports it. It is ABI-compatible with libtest.so
.libtest_dispatch.so
: This is the library that is responsible for choosing which variant of the library to use at runtime.In keeping with the approach suggested in the linked answer above, I'm doing the following:
libtest.so
.DT_SONAME
field of libtest.so
set to libtest_dispatch.so
. Therefore, when I run the application, it will load libtest_dispatch.so
instead of the actual dependency libtest.so
.libtest_dispatch.so
is configured to have a constructor function that looks like this (pseudocode):
__attribute__((constructor)) void init()
{
if (can_use_variant) dlopen("libtest_variant" SHLIB_EXT, RTLD_NOW | RTLD_GLOBAL);
else dlopen("libtest" SHLIB_EXT, RTLD_NOW | RTLD_GLOBAL);
}
The call to dlopen()
will load the shared library that provides the appropriate implementation, and the application moves on.
Result: This works! If I place an identically-named function in each shared library, I can verify at runtime that the appropriate version is executed based upon the conditions used by the dispatch library.
The problem: The above works for the toy example that I demonstrated it with in the linked question. Specifically, it seems to work fine if the libraries only export functions. However, once there are variables in play (whether they be global variables with C linkage or C++ constructs like typeinfo
), I get unresolved-symbol errors at runtime.
The below code demonstrates the problem:
libtest.h:
extern int bar;
int foo();
libtest.cc:
#include <iostream>
int bar = 2;
int foo()
{
std::cout << "function call came from libtest" << std::endl;
return 0;
}
libtest_variant.cc:
#include <iostream>
int bar = 1;
int foo()
{
std::cout << "function call came from libtest_variant" << std::endl;
return 0;
}
libtest_dispatch.cc:
#include <dlfcn.h>
#include <iostream>
#include <stdlib.h>
__attribute__((constructor)) void init()
{
if (getenv("USE_VARIANT")) dlopen("libtest_variant" SHLIB_EXT, RTLD_NOW | RTLD_GLOBAL);
else dlopen("libtest" SHLIB_EXT, RTLD_NOW | RTLD_GLOBAL);
}
test.cc:
#include "lib.h"
#include <iostream>
int main()
{
std::cout << "bar: " << bar << std::endl;
foo();
}
I build the libraries and test application using the following:
g++ -fPIC -shared -o libtest.so libtest.cc -Wl,-soname,libtest_dispatch.so
g++ -fPIC -shared -o libtest_variant.so libtest_variant
g++ -fPIC -shared -o libtest_dispatch.so libtest_dispatch.cc -ldl
g++ test.cc -o test -L. -ltest -Wl,-rpath,.
Then, I try to run the test using the following command lines:
> ./test
./test: symbol lookup error: ./test: undefined symbol: bar
> USE_VARIANT=1 ./test
./test: symbol lookup error: ./test: undefined symbol: bar
Failure. If I remove all instances of the global variable bar
and try to dispatch the foo()
function only, then it all works. I'm trying to figure out exactly why and whether I can get the effect that I want in the presence of global variables.
Debugging: In attempting to diagnose the problem, I've done some playing with the LD_DEBUG
environment variable while running the test program. It seems like the problem comes down to this:
The dynamic linker performs relocations of global variables from shared libraries very early in the loading process, before constructors from shared libraries are called. Therefore, it tries to locate some global variable symbols before my dispatch library has had a chance to run its constructor and load the library that will actually provide those symbols.
This seems to be a big roadblock. Is there some way that I can alter this process so that my dispatcher can run first?
I know that I could preload the library using LD_PRELOAD
. However, this is a cumbersome requirement for the environment that my software will eventually run in. I'd like to find a different solution if possible.
Upon further review, it appears that even if I LD_PRELOAD
the library, I have the same problem. The constructor still doesn't get executed before the global variable symbol resolution occurs. Usage of the preload feature just pushes the desired library to the top of the library list.
Upvotes: 4
Views: 1979
Reputation: 213616
Failure. If I remove all instances of the global variable bar and try to dispatch the foo() function only, then it all works.
The reason this works without global variables is that functions (by default) use lazy binding, but variables can not (for obvious reasons).
You would get the exact same failure without any global variables if your test program is linked with -Wl,-z,now
(which would disable lazy binding of functions).
You could fix this by introducing an instance of every global variable referenced by your main program into the dispatch library.
Contrary to what your other answer suggests, this is not the standard way to do CPU-specific dispatch.
There are two standard ways.
The older one: use $PLATFORM
as part of DT_RPATH
or DT_RUNPATH
. The kernel will pass in a string, such as x86_64
, or i386
, or i686
as part of the aux
vector, and ld.so
will replace $PLATFORM
with that string.
This allowed distributions to ship both i386
and i686
-optimized libraries, and have a program select appropriate version depending on which CPU it was running on.
Needless to say, this isn't very flexible, and (as far as I understand) doesn't allow you to distinguish between various x86_64
variants.
The new hotness is IFUNC
dispatch, documented here. This is what GLIBC currently uses to provide different versions of e.g. memcpy
depending on which CPU it is running on. There is also target
and target_clones
attribute (documented on the same page) that allows you to compile several variants of the routine, optimized for different processors (in case you don't want to code them in assembly).
I'm trying to apply this functionality to an existing, very large library, so just a recompile is the most straightforward way of implementing it.
In that case, you may have to wrap the binary in a shell script, and set LD_LIBRARY_PATH
to different directories depending on the CPU. Or have the user source
your script before running the program.
target_clones does look interesting; is that a recent addition to gcc
I believe the IFUNC
support is about 4-5 years old, the automatic cloning in GCC is about 2 years old. So yes, quite recent.
Upvotes: 3
Reputation: 62583
It might not be relocations per se (-fPIC suppressess relocations), but a lazy binding through GOT (Global Offset Table), with the same effect. This is unvoidable, since Linker has to bind variables before init is called - simply because init might as well reference those symbols.
Ad for solutions... Well, once solution might be to not use (or even expose) global variables to the executable code. Instead, provide a set of functions to access them. Global variables are not welcome anyways :)
Upvotes: 1