karsten
karsten

Reputation: 703

Different math symbol bindings with shared library with dlopen and directly linked into executable (Linux)

I have two shared libraries libA and libB used on Linux, which are used in two ways: 1. Directly linked as shared libs to an "offline" test executable. 2. Used in the real application: an auxiliary wrapper library (libWrapper) is linked against libA and libB, the application opens only the wrapper lib using system call dlopen("libWrapper.so", RTLD_NOW | RTLD_LOCAL).

The problem: the libraries run complex image analysis algorithms, and sometimes the numeric results are not equal. I should find a way to make sure the test executable gives the same results as the real application, but I am not permitted to change the libraries nor the real application, but only the test executable.

I used LD_DEBUG=bindings to find differences in the output (to stderr):

$ grep acosf log-bindings.test-executable  # *"offline" test executable*
binding file libB.so to libA.so: normal symbol `acosf.J'
binding file libB.so to libA.so: normal symbol `acosf.A'
binding file libA.so to libA.so: normal symbol `acosf.J' 
binding file libA.so to libA.so: normal symbol `acosf.A' 
binding file libB.so to libA.so: normal symbol `acosf'   <<<<<<<
binding file libA.so to libA.so: normal symbol `acosf'   <<<<<<<


$ grep acosf log-bindings.process   # logging from *real process*
binding file libB.so to libA.so: normal symbol `acosf.J'
binding file libB.so to libA.so: normal symbol `acosf.A'
binding file libB.so to libB.so: normal symbol `_ZSt4acosf'  # std::acosf
binding file libB.so to **libm**.so.6: normal symbol `acosf'      <<<<<<
binding file libA.so to libA.so: normal symbol `acosf.J' 
binding file libA.so to libA.so: normal symbol `acosf.A' 
binding file libA.so to **libm**.so.6: normal symbol `acosf'      <<<<<<

(paths removed for clarity)

This suggests, that with the real application a lot of math functions symbols (cos, cosf, exp, expf, sin, sinf, acos....) are used from the system math library libm, while with the test executable the bindings are from libB to the library libA, and from libA to libA itself. This could be the reason for the differences.

May I take function acosf() as example: With linker option -y acosf we get output during build by passing -Wl,yacosf to the compiler:

release/libBdl/lib/libA.so: definition of acosf
release/libBdl/lib/libB.so: reference to acosf 

I use the nm tool to show symbols in the libraries:

$ nm  libA/libA.so | grep acosf
00665200 T acosf                          # impl. of acosf (text symbol)
0066c360 T acosf.A
0066c55c T acosf.J
00271fae t _Z13acosf_checkedf             # acosf_checked(float)
00708244 r _Z13acosf_checkedf$$LSDA

$ nm  libB/libB.so | grep acosf
01423780 T acosf                          # impl. of acosf (text symbol)
01424410 T acosf.A
0142460c T acosf.J
004c1b3a W _ZSt4acosf
01547eec r _ZSt4acosf$$LSDA

Although the math lib on the release computer has no symbols, I assume the method of libm is the same: it defines weak symbols expf or acosf in teh lib, which the user should be able to override in his own lib with a strong symbol:

[newer CentOS7 system]$ nm /usr/lib/libm.so|grep acosf
0001b9c0 W acosf      # weak symbol 'acosf'
0001b9c0 t __acosf    # strong symbol / implementation
000176b0 T __acosf_finite
000176b0 t __ieee754_acosf   # called by __acosf in libm

[newer CentOS7 system]$ nm /usr/lib/libm.so|grep expf
0001bc60 W expf       # weak symbol 'expf'
0001bc60 t __expf     # strong symbol / implementation
00017990 i __expf_finite
0002d370 t __expf_finite_ia32
0002d1b0 t __expf_finite_sse2
00017960 i __ieee754_expf      # called by __expf in libm
0002d330 t __ieee754_expf_ia32
0002d1b0 t __ieee754_expf_sse2

readelf -Ws ..| grep acosf result:

test-executable:
--

real-application:
--

libWrapper.so:
--

libB.so:
3934: 004c12a6    40 FUNC    WEAK   DEFAULT   10 _ZSt4acosf
5855: 01423b80   506 FUNC    GLOBAL DEFAULT   10 acosf.A
10422: 01423d7c   666 FUNC    GLOBAL DEFAULT   10 acosf.J
14338: 01422ef0    40 FUNC    GLOBAL DEFAULT   10 acosf

libA.so:
2333: 0066c1e8   506 FUNC    GLOBAL DEFAULT   10 acosf.A
4179: 0066c3e4   666 FUNC    GLOBAL DEFAULT   10 acosf.J
5772: 00665088    40 FUNC    GLOBAL DEFAULT   10 acosf

I think, the problems with symbol bindings are the typical Unix system-V problems described in https://en.wikipedia.org/wiki/Weak_symbol in section "Limitations". With dlopen() the dynamic linker prefers libm with its weak symbol, because it is already loaded, although a strong symbol is available in libA "later". ~

With LD_DEBUG=all:

test-executable:

symbol=expf; lookup in file=./test-executable.shared 
symbol=expf; lookup in file=/lib/libdl.so.2
symbol=expf; lookup in file=/home/test/test/bin_NDEBUG/libA/libA.so
binding file libB.so to libA.so: normal symbol `expf'   <<<<

symbol=acosf; lookup in file=./test-executable.shared
symbol=acosf; lookup in file=/lib/libdl.so.2
symbol=acosf; lookup in file=/home/test/test/bin_NDEBUG/libA/libA.so
binding file libA.so to libA.so: normal symbol `acosf'   <<<<



real-application:

symbol=expf; lookup in file=real-application
symbol=expf; lookup in file=/home/test/lib/libX1.so
symbol=expf; lookup in file=/home/test/lib/libX2.so
symbol=expf; lookup in file=/home/test/lib/libX3.so
symbol=expf; lookup in file=/home/test/lib/libX4.so 
symbol=expf; lookup in file=/lib/libdl.so.2 
symbol=expf; lookup in file=/usr/lib/libstdc++.so.5 
symbol=expf; lookup in file=/home/test/lib/libX5.so
symbol=expf; lookup in file=/lib/i686/libm.so.6
binding file libA.so to libm.so.6: normal symbol `expf'    <<<<<<<


symbol=acosf; lookup in file=real-application
symbol=acosf; lookup in file=/home/test/lib/libX1.so
symbol=acosf; lookup in file=/home/test/lib/libX2.so
symbol=acosf; lookup in file=/home/test/lib/libX3.so
symbol=acosf; lookup in file=/home/test/lib/libX4.so
symbol=acosf; lookup in file=/lib/libdl.so.2
symbol=acosf; lookup in file=/usr/lib/libstdc++.so.5
symbol=acosf; lookup in file=/home/test/lib/libX5.so 
symbol=acosf; lookup in file=/lib/i686/libm.so.6
binding file libA.so to libm.so.6: normal symbol `acosf'  <<<<<<

The auxiliary lib "libWrapper" is linked to libA and libB but does not have the symbol acosf.

The platform is an old 32-bit Linux using kernel 2.4 and glibc 2.2.5 (yes, 2001!).

The libs A and B are built using an Intel Icc compiler with options -O3, NDEBUG. With DEBUG there does not seem to be a problem. The static / archive build has slightly different results compared with the shared linking.

The test executable is linked directly to shared libs libA and libB using g++ (or icc, makes no difference). I tried hard to get the test executable to also bind the math symbols to libm, by use of LD_PRELOAD or various linker flags, but this did not change anything.

My hypothesis: The dlopen call in the real application does come much later, after the usual libraries (and libm) are loaded and the application is started. And symbols are preferred if already found in previously loaded libs although the symbol there is a weak symbol, and a strong symbol available in libA. Probably this is just the behaviour of the old Linux, but the Wikipedia article on weak symbols in section "Limitations" describes just such an weakness of the linker for Unix system-V like systems.

I tried

linker option -Wl,--no-whole-archive 
define LD_BIND_NOW 
define LD_PRELOAD=libm.so 

for the test-executable, but this had no effect on the symbol binding:

symbol=acosf;  lookup in file=./test-executable.shared
symbol=acosf;  lookup in file=/lib/i686/libm.so.6
symbol=acosf;  lookup in file=/lib/libdl.so.2
symbol=acosf;  lookup in file=libA.so
binding file libA.so to libA.so: normal symbol `acosf'

My Question: why is it, that even with LD_PRELOAD the test-executable does not change and sticks to the in-library implementations (of libA), but using dlopen it uses libm symbols?!? And how could I force the test-executable to behave equally as the real-application, i.e. use libm symbols?

Regrettably several modern flags to dlopen are not available, and also the linker misses e.g. --exclude-symbols. Also LD_DYNAMIC_WEAK environment variable is not available on the old Linux. Probably the only solution is to rewrite the test executable to use dlopen, too.

Any ideas are appreciated.

Upvotes: 3

Views: 733

Answers (2)

karsten
karsten

Reputation: 703

I think I can answer the question myself.

The dlopen call in the real application does come much later, after the usual libraries (and libm) are loaded and the application execution is started. And symbols are preferred if already found in previously loaded libs although the symbol there is a weak symbol, and a strong symbol available in libA (loaded via dlopen later in program execution). A Wikipedia article on weak symbols in section "Limitations" describes just such an weakness of the dynamic linker ld-linux.so for Unix system-V like systems (in this case Linux). With LD_DEBUG=all you can see how the linker searches a symbol.

In this case, where the original application and the shared libs must not be changed (linker flags, how and which symbols are exported), the only solution remains to rewrite the test executable to also use dlopen (as the real application).

Upvotes: 0

Employed Russian
Employed Russian

Reputation: 213879

I am not permitted to change the libraries or the real application.

If you are not allowed to change anything, then you can't fix the problem.

I used LD_DEBUG=bindings to find differences, and found that ...

LD_DEBUG is the wrong tool for debugging this. Use GDB instead.

Set a breakpoint on e.g. cos, run the two binaries, and confirm that they are in fact executing different code. Once you know that cos in one of the cases resides in libA (I can't quite parse your description, but I think that's what you claim to have observed), figure out how it gets into libA (use linker flag -Wl,-y,cos to determine that).

Symbol visibility may play a part is why symbol resolution behaves differently. Exact command line used to link prod-exe, test-exe, libA.so and libB.so may matter. Running readelf -Ws prot-exe test-exe libA.so libB.so | grep ' cos$' may also be illuminating.

Once you have all the info (and assuming you still can't understand what's happening), ask a new question with more detailed record of observations.

Upvotes: 0

Related Questions