sdgfsdh
sdgfsdh

Reputation: 37045

Name mangling confusion in LLVM

I have been trying to build and execute LLVM modules. My code for generating the modules is quite long, so I won't post it here. Instead my question is about how Clang and LLVM work together to achieve name mangling. I will explain my specific issue to motivate the question.

Here is the source-code of one of my LLVM modules:

#include <iostream>

int main() {
  std::cout << "Hello, world. " << std::endl;
  return 0;
}

Here is the generated LLVM IR; it is too big for StackOverflow.

When I try to execute my module using lli, I get the following error:

LLVM ERROR: Program used external function '__ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc' which could not be resolved!

Running the symbol through a demangler, the missing symbol is:

_std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(unsigned long, char)

The extra _ is suspicious, and the function without the leading underscore seems to exist in the IR!

; Function Attrs: alwaysinline ssp uwtable
define available_externally hidden void @_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc(%"class.std::__1::basic_string"*, i64, i8 signext) unnamed_addr #2 align 2 {
  %4 = alloca %"class.std::__1::basic_string"*, align 8
  %5 = alloca i64, align 8
  %6 = alloca i8, align 1
  store %"class.std::__1::basic_string"* %0, %"class.std::__1::basic_string"** %4, align 8
  store i64 %1, i64* %5, align 8
  store i8 %2, i8* %6, align 1
  %7 = load %"class.std::__1::basic_string"*, %"class.std::__1::basic_string"** %4, align 8
  %8 = load i64, i64* %5, align 8
  %9 = load i8, i8* %6, align 1
  call void @_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC2Emc(%"class.std::__1::basic_string"* %7, i64 %8, i8 signext %9)
  ret void
}

I am on macOS, so a leading underscore is to be expected, but I think that the Clang might be adding it twice.

I looked through the LLVM / Clang source, and it seems that there are two mangling steps:

  1. Taking possibly overloaded C++ functions and mangling them to unique names for the LLVM IR
  2. Taking a mangled name from the LLVM IR and adding any platform-specific quirks, such as leading underscores

However, this is just my theory. Can someone could explain how the mangling process works in Clang and LLVM? How should I create my llvm::DataLayout objects to get the correct mangling for my platform?


nm -gU /usr/lib/libc++.dylib` and `nm -gU /usr/lib/libc++abi.dylib` do not contain `__ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorI‌​cEEEC1Emc

When I try to compile the IR, I get this error:

llc generated.ll
clang++ generated.s

Undefined symbols for architecture x86_64:
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::data() const", referenced from:
  std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> > std::__1::__pad_and_output<char, std::__1::char_traits<char> >(std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> >, char const*, char const*, char const*, std::__1::ios_base&, char) in generated-b4252a.o
"std::__1::basic_ostream<char, std::__1::char_traits<char> >::sentry::operator bool() const", referenced from:
  std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::fill() const", referenced from:
  std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::rdbuf() const", referenced from:
  std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> >::ostreambuf_iterator(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::widen(char) const", referenced from:
  std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::endl<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) in generated-b4252a.o
"std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(unsigned long, char)", referenced from:
  std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> > std::__1::__pad_and_output<char, std::__1::char_traits<char> >(std::__1::ostreambuf_iterator<char, std::__1::char_traits<char> >, char const*, char const*, char const*, std::__1::ios_base&, char) in generated-b4252a.o
"std::__1::basic_ios<char, std::__1::char_traits<char> >::setstate(unsigned int)", referenced from:
  std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) in generated-b4252a.o
ld: symbol(s) not found for architecture x86_64
clang-3.9: error: linker command failed with exit code 1 (use -v to see invocation)

Upvotes: 16

Views: 6686

Answers (1)

compor
compor

Reputation: 2329

I wouldn't suspect a name mangling issue. C++ name mangling happens at the front-end (i.e. clang) and it's part of a pretty well-defined/-documented ABI standard.

Moreover, I don't think there is a spurious underscore, cause that does not produce a valid C++ name back and the mangled name in the pastebin link that you provided appears as:

_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEC1Emc

I'm not on Mac OS, but simulating with my LLVM 3.8.1 on Linux (using --stdlib=libc++), using the same source and matching the IR line by line, I get the following symbol:

_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEmc

which demangles back to:

std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__init(unsigned long, char)

which I guess does pretty much the same construction of some sort.

So, I believe that your linker picks up the wrong libc++ version.

You could check the symbols available in the libc++ that is tied to the clang/LLVM that you are using, found in the directory given by llvm-config --libdir or even checking the rpath entry of your toolchain binaries with readelf -d $(which lli).

If there are multiple LLVM installations (e.g. a system one and one that you compiled from sources yourself), you might have to play around with the -L option of clang which directs ld to add that path in its search list. A quick alternative (that I wouldn't recommend for regular use) is to do this on the command line:

LD_LIBRARY_PATH=$(llvm-config --libdir) clang generated.s

Upvotes: 4

Related Questions