Reputation: 10852
I'm trying to debug a nodejs core dump as an experimentation. My main target was simulating a production problem and seeing V8 stack trace in a core dump. Therefore I intentionally wrote a piece of code in NodeJS
that performs constant string concatenation in a forked NodeJS process, and at the end due to memory limits, it's been aborted and the core dump is taken.
So far so good, However after installing GDB
and providing the dump file, it's loaded but when I execute bt
, it only shows the following output.
$ gdb /usr/local/bin/node -c core.f8f32091796c.node.1700129772.28
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/node...
[New LWP 28]
[New LWP 30]
[New LWP 29]
[New LWP 33]
[New LWP 31]
[New LWP 32]
[New LWP 34]
Core was generated by `/usr/local/bin/node template.js'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fa1a8f5a3f2 in setjmp () from /lib/ld-musl-x86_64.so.1
[Current thread is 1 (LWP 28)]
(gdb) bt
#0 0x00007fa1a8f5a3f2 in setjmp () from /lib/ld-musl-x86_64.so.1
#1 0x00007fa1a8f5a54d in raise () from /lib/ld-musl-x86_64.so.1
#2 0x00007fa1a8f5b9a9 in ?? () from /lib/ld-musl-x86_64.so.1
#3 0x00007fa1a8faae98 in ?? () from /lib/ld-musl-x86_64.so.1
#4 0x0000000000000000 in ?? ()
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x00007fa1a8e317d0 0x00007fa1a8ebab61 Yes (*) /usr/lib/libstdc++.so.6
0x00007fa1a8d5c2f0 0x00007fa1a8d6c551 Yes (*) /usr/lib/libgcc_s.so.1
0x00007fa1a8f29070 0x00007fa1a8f70761 Yes (*) /lib/ld-musl-x86_64.so.1
(*): Shared library is missing debugging information.
(gdb) info frame
Stack level 0, frame at 0x7ffce26c2038:
rip = 0x7fa1a8f5a3f2 in setjmp; saved rip = 0x7fa1a8f5a54d
called by frame at 0x7ffce26c2050
Arglist at 0x7ffce26c2030, args:
Locals at 0x7ffce26c2030, Previous frame's sp is 0x7ffce26c2040
Saved registers:
rip at 0x7ffce26c2038
If I load the same dump file with LLDB it works as expected and shows the following output.
(lldb) target create "/usr/local/bin/node" --core "core.f8f32091796c.node.1700129772.28"
Core file '/dump/core.f8f32091796c.node.1700129772.28' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'node', stop reason = signal SIGABRT
* frame #0: 0x00007fa1a8f5a3f2 ld-musl-x86_64.so.1`__setjmp + 118
frame #1: 0x00007fa1a8f5a54d ld-musl-x86_64.so.1`raise + 64
frame #2: 0x00007fa1a8f30f25 ld-musl-x86_64.so.1`abort + 14
frame #3: 0x00005641a6ef5e55 node`node::Abort() + 37
frame #4: 0x00005641a6e00d27 node`node::OnFatalError(char const*, char const*) + 283
frame #5: 0x00005641a70ed0e2 node`v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) + 82
frame #6: 0x00005641a70ed46f node`v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) + 847
frame #7: 0x00005641a72bf365 node`v8::internal::Heap::FatalProcessOutOfMemory(char const*) + 21
...
...
Both are running on the same container and here is the version information of all the tools used.
$ cat /etc/*release
3.14.3
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.14.3
PRETTY_NAME="Alpine Linux v3.14"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
$ lldb --version
lldb version 11.1.0
$ gdb --version
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ node --version
v16.13.0
I want to make it run with GDB, I know there is llnode
. Because of that, I went down the rabbit hole and explored all the related concepts like stack unwinding, .eh_frame, debug frames etc.
even compiled and installed musl-dbg from alpine repository but no success. Gdb complains about CRC Mismatch. But I don't think its the root cause, because LLDB
works without installing any debug packages.
I have upgraded the container to node:20.10-alpine3.18
and installed latest GDB version(13.1) available on that Alpine version. to make sure it wasn't a bug in GDB 10.2
. Reproduced a new core dump but nothing has changed, still GDB can't show the backtrace
but LLDB can.
Since I have upgraded the alpine
version, I thought I could find the matching musl-dbg
package to get rid of CRC mismatch
error, and gave it a last chance and installed the matching package finally. Surprise It's working up to some specific frame but can't show some frames like LLDB. Here is some part of the working GDB output.
(gdb) bt
#0 __restore_sigs (set=set@entry=0x7fffdc487220) at ./arch/x86_64/syscall_arch.h:40
#1 0x00007ff282f41e5c in raise (sig=sig@entry=6) at src/signal/raise.c:11
#2 0x00007ff282f14fa8 in abort () at src/exit/abort.c:11
#3 0x000055d41a8c3e86 in node::Abort() ()
#4 0x000055d41a7865cd in node::OOMErrorHandler(char const*, v8::OOMDetails const&) ()
#5 0x000055d41aaf8010 in v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) ()
#6 0x000055d41aaf82ff in v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) ()
#7 0x000055d41ad28917 in v8::internal::Heap::FatalProcessOutOfMemory(char const*) ()
#8 0x000055d41ad41d61 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
...
...
It's good to see it works but the main question still remains, why can LLDB show all the frames without having a debug symbol but GDB can't?
Upvotes: 1
Views: 154
Reputation: 10852
I went down the rabbit hole and explored why it wasn't working. Basically, GDB
was using a prologue
based unwinding strategy but it was wrongly calculating the frames probably due to setjmp
function written purely in assembly
in musl-libc
. At the end, I wrote a custom unwinder for musl
frames and it worked. This is the code, also I have written a blog series to explain the problem and solution, unfortunately it's not in English.
import re
import gdb
from gdb.unwinder import Unwinder
def debug(pc, current_rsp, offset, addr, frame_id, func):
print('=============debug===========')
print('{:<20}:{:<8}'.format('function',func))
print('{:<20}:{:<8}'.format('pc', str(pc)))
print('{:<20}:{:<8}'.format('current_rsp', str(current_rsp)))
print('{:<20}:{:<8}'.format('offset', str(offset)))
print('{:<20}:{:<8}'.format('return address', hex(addr)))
print('{:<20}:{:<8}'.format('frame_id', str(frame_id)))
u64_ptr = gdb.lookup_type('unsigned long long').pointer()
class FrameID:
def __init__(self, sp, pc):
self.sp = sp
self.pc = pc
def __str__(self):
return f'sp: {self.sp}, pc: {self.pc}'
class MuslUnwinder(Unwinder):
def __init__(self):
super().__init__("musl_unwinder")
def is_musl_frame(self,pc):
obj = gdb.execute("info symbol 0x%x" % pc, False, True)
return "musl" in obj
def dereference(self,adr):
deref = gdb.parse_and_eval("0x%x" % adr).cast(u64_ptr).dereference()
return deref
def __call__(self, pending_frame):
pc = pending_frame.read_register("pc")
if not self.is_musl_frame(pc):
return None
asm = gdb.execute("disassemble 0x%x" % pc, False, True)
lines = asm.splitlines()
func = None
args_bytes = 0
locals_bytes = 0
rbp_bytes = 0
for line in lines:
m = re.match('Dump of assembler code for function (.*):', line)
if m:
func = m.group(1)
elif re.match('.*push[ ]*%', line):
args_bytes += 8
if "rbp" in line:
rbp_bytes += 8
elif m := re.match('.*sub[ ]*\\$0x([A-Fa-f0-9]+),%rsp', line):
locals_bytes = int(m.group(1), 16)
break
offset = locals_bytes + args_bytes
current_rsp = pending_frame.read_register("rsp")
current_rbp = pending_frame.read_register("rbp")
rsp = current_rsp + offset + 8
return_addr = self.dereference(current_rsp + offset)
frame_id = FrameID(rsp, pc)
unwind_info = pending_frame.create_unwind_info(frame_id)
unwind_info.add_saved_register("rsp", rsp)
unwind_info.add_saved_register("rip", return_addr)
if rbp_bytes > 0:
saved_rbp = self.dereference(current_rsp+locals_bytes+rbp_bytes)
unwind_info.add_saved_register("rbp", saved_rbp)
else:
unwind_info.add_saved_register("rbp", current_rbp)
if gdb.parameter("verbose"):
debug(pc, current_rsp, offset, return_addr, frame_id, func)
return unwind_info
gdb.execute('set disassembly-flavor att')
gdb.unwinder.register_unwinder(None, MuslUnwinder(), replace=True)
gdb.invalidate_cached_frames()
Upvotes: 0