Tinyden
Tinyden

Reputation: 574

gdb corefile generated by C++ segfault and pybinder with no symbols

context: I have a program which runs on server, which segfaults several times a month. The program is a python program which uses some library implemented in C++ and exposed by pybinder.

I am able to capture the corefile on server and I have the source code (both C++ and python part). I want to know how I can get the segfault stacktrace?

Several things I have tried to

  1. build the source code (C++ part) with -g3 option. From my understand, it should have the same binary and address as the one running on server. The only difference should be symbol table (and possibly several other sections in ELF).

  2. I tried to gdb -ex r bazel-bin/username/coredump/capture_corefile /tmp/test_coredump/corefile.python.3861066. bazel-bin/username/coredump/capture_corefile is the python script in C++ with symbol table. /tmp/test_coredump/corefile.python.3861066 is the corefile I have collected.

But it shows

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f58ca51332b in ?? ()
Starting program:  
No executable file specified.
  1. I tried to directly get the line of code by llvm-symbolizer. For python script as the object, it fails directly.
desktop$ llvm-symbolizer --obj=bazel-bin/username/coredump/capture_corefile 0x00007f58ca51332b
LLVMSymbolizer: error reading file: The file was not recognized as a valid object file
??
??:0:0

For shared object, it also fails:

desktop$ llvm-symbolizer --obj=bazel-bin/username/coredump/coredump_pybind.so 0x00007f58ca51332b
_fini
??:0:0

I confirm the symbol table is not stripped:

file bazel-bin/username/coredump/coredump_pybind.so
bazel-bin/experimental/hjiang/coredump/coredump_pybind.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[md5/uuid]=baf3b4d9a8f7955b5db6b977843e2eb0, not stripped

Does someone know how to get the stacktrace with everything I have?

Upvotes: 0

Views: 294

Answers (1)

Employed Russian
Employed Russian

Reputation: 213375

build the source code (C++ part) with -g3 option. From my understand, it should have the same binary and address as the one running on server.

This is by no means guaranteed, and actually pretty hard to achieve with GCC.

You didn't mention the compiler you use, but if it is Clang (implied by your later use of llvm-symbolizer), then note that Clang currently doesn't produce the same code with and without -g.

In addition, to make this work, you need to keep all the flags originally used (including all optimization flags) -- it's no good to replace -O2 with -g3 -- the binary will be vastly different.

You can check whether your rebuilt library is any good by running nm original.so, nm replacement.so, and comparing the addresses of any symbols which appear in both outputs. The replacement.so is usable IFF all common symbol's addresses match.

The best practice here is to build the .so with optimization and debug info (e.g. gcc ... -g3 -O2 ...), keep that binary for future debugging, but send a striped binary to the server. That way you are guaranteed to have the exact binary you need if/when the stripped binary crashes.


gdb -ex r bazel.../corefile

The above command asks gdb to run a core file, which makes no sense.

Whatever you tried to achieve here, that isn't the right way to do it.

Also, GDB (in general) can't help if you give it only the core -- for most tasks you also need the binary which produced that core.


Your first step should be to get a crash stack trace, as described e.g. here. Once you have something that looks like a reasonable stack trace, you could try swapping full-debug version of .so.

Upvotes: 1

Related Questions