Eox
Eox

Reputation: 381

What actually happens when two shared libraries define the same symbol?

I recently encountered a crash issue when I linked two shared libraries (both made by myself) together. I eventually found it was because of one source file duplicated between the two files. In that source file a global std::vector was defined (in fact a static member of a class), and it ended up with being freed twice -- one by each library.

I then wrote some test code to verify my thought. Here in a header I declare a class and a global object of this class:

#ifndef SHARED_HEADER_H_
#define SHARED_HEADER_H_

#include <iostream>

struct Data {
  Data(void) {std::cout << "Constructor" << std::endl;}
  ~Data(void) {std::cout << "Destructor" << std::endl;}
  int FuncDefinedByLib(void) const;
};

extern const Data data;

#endif

The FuncDefinedByLib function is left undefined. I then created two libraries, libA and libB, both include this header. libA looks like this

const Data data;

int Data::FuncDefinedByLib(void) const {return 1;}

void PrintA(void) {
  std::cout << "LibB:" << &data << " "
    << (void*)&Data::FuncDefinedByLib <<  " "
    << data.FuncDefinedByLib() << std::endl;
}

It defines the global data object, the FuncDefinedByLib function, and a function PrintA that prints the address of the data object, the address of FuncDefinedByLib, and the return value of FuncDefinedByLib.

libB is almost same as libA except the name PrintA is changed to PrintB and FuncDefinedByLib returns 2 instead of 1.

Then I create a program that links to both of the libraries and calls PrintA and PrintB. Before encountering the crash issue I thought both libraries would create their own versions of class Data. However, the actual output

Constructor
Constructor
LibB:0x7efceaac0079 0x7efcea8bed60 1
LibB:0x7efceaac0079 0x7efcea8bed60 1
Destructor
Destructor

Indicates that both libraries use only one version of class Data and only one version of const Data data even if the class and the object are defined differently, which is from libA (I understand it is because libA is linked first). And the double destruction clearly explains my crash problem.

So here are my questions

  1. How does this happen? I understand the main code linking against the two libraries may only link to the first symbol it sees. But a shared library should has been linked internally when it is created (or it is not? I really have no much knowledge of shared library), how can they know there is a twin class in other libraries and link to that when after they have been created on their own?

  2. I know having duplicate code between shared libraries is generally a bad practice. But is there a condition that by satisfying it duplication between libraries is safe? Or is there a systematic way to make my code duplicabale without risk? Or it is never safe and should always be strictly prohibited? I don't want to always split another shared library just to share a tiny piece of code.

  3. This behavior looks magical. Does anyone utilize this behavior to do some good magical things?

Upvotes: 6

Views: 2929

Answers (1)

Alecto
Alecto

Reputation: 10740

Part 1: About the Linker

This is a known problem in both C and C++, and it's the result of the current compilation model. A full explanation of how it happens is beyond the scope of this answer, however this talk by Matt Godbolt provides an in-depth explanation of the process for beginners. See also this article on the linker.

There's a new version of C++ coming out in 2020, and it'll introduce a new compilation model (called Modules) that avoids problems like this. You'll be able to import and export stuff from a module, similar to the way packages work in Java.

Part 2: Solving your problem

There are a few different solutions.

Magical Solution 1: One unique global variable

This one's pretty slick. If you stick the global variable inside a function as a static variable, it will always get constructed only once, and this is ensured by the standard (even in a multithreaded environment).

#ifndef SHARED_HEADER_H_
#define SHARED_HEADER_H_

#include <iostream>

struct Data {
  Data(void) {std::cout << "Constructor" << std::endl;}
  ~Data(void) {std::cout << "Destructor" << std::endl;}
  int FuncDefinedByLib(void) const;
};
Data& getMyDataExactlyOnce() {
    // The compiler will ensure
    // that data only gets constructed once
    static Data data;
    // Because data is static, it's fine to return a reference to it
    return data; 
}

// Here, the global variable is a reference
extern const Data& data = getMyDataExactlyOnce();

#endif

Magical Solution 2: Multiple distinct global variables, 1 per translation unit

If you mark a global variable as inline in C++17, then each translation unit that includes the header gets its own copy at it's own location in memory. See: https://en.cppreference.com/w/cpp/language/inline

#ifndef SHARED_HEADER_H_
#define SHARED_HEADER_H_

#include <iostream>

struct Data {
  Data(void) {std::cout << "Constructor" << std::endl;}
  ~Data(void) {std::cout << "Destructor" << std::endl;}
  int FuncDefinedByLib(void) const;
};
// Everyone gets their own copy of data
inline extern const Data data;

#endif

Part 3: Can we use this to do Dark Magic?

Kind of. If you really, really wanna do Dark Magic with global variables, C++14 introduces templated global variables:

template<class Key, class Value>
std::unordered_map<Key, Value> myGlobalMap; 

void foo() {
    myGlobalMap<int, int>[10] = 20;
    myGlobalMap<std::string, std::string>["Hello"] = "World"; 
}

Make of that what you will. I haven't had much use for templated global variables, although I imagine if you were doing something like counting the number of times a function was called, or the number of times a type was created, it'd be useful to do this.

Upvotes: 3

Related Questions