Reputation: 985
I have been trying to figure out the root cause for a segmentation fault that I see while running my application with Address Sanitizer(ASAN) enabled. When I attach GDB and debug the application, I see the segfault being received right at the beginning of the method:
Minimal code:
int TimerScope::switchMode() {
doCapture(mode)
}
> int TimerScope::doCapture(Mode captureMode) { <---- segfault here
if(handle == -1)
return 0;
XLOG(TRACE, image(this));
..
}
Note that I don't see the issue for a build without address sanitizer. I have looked at different aspects of this issue (like looking for garbage address of variables, running valgrind/UBSAN etc) without any luck. Currently I am looking into the assembly code to see if there are any clues there. With GDB, when I print the location of the segfault, this is what I get:
(gdb) p $_siginfo._sifields._sigfault.si_addr
$5 = (void *) 0x7fe4d3908fb8
The assembly code is as given below, which is executing some logic as the method TimerScope::doCapture
gets called:
0x7fe69595f65e <_ZN7ts9TimerScope9doCaptureENS_8ModeE> endbr64 │
│ 0x7fe69595f662 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+4> push %rbp │
│ 0x7fe69595f663 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+5> mov %rsp,%rbp │
│ 0x7fe69595f666 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+8> push %r15 │
│ 0x7fe69595f668 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+10> push %r14 │
│ 0x7fe69595f66a <_ZN7ts9TimerScope9doCaptureENS_8ModeE+12> push %r13 │
│ 0x7fe69595f66c <_ZN7ts9TimerScope9doCaptureENS_8ModeE+14> push %r12 │
│ 0x7fe69595f66e <_ZN7ts9TimerScope9doCaptureENS_8ModeE+16> push %rbx │
│ 0x7fe69595f66f <_ZN7ts9TimerScope9doCaptureENS_8ModeE+17> sub $0x1000,%rsp │
│ 0x7fe69595f676 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+24> orq $0x0,(%rsp) │
│ 0x7fe69595f67b <_ZN7ts9TimerScope9doCaptureENS_8ModeE+29> sub $0x1a8,%rsp │
│ > 0x7fe69595f682 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+36> mov %rdi,-0x1198(%rbp) │
│ 0x7fe69595f689 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+43> mov %esi,%eax │
│ 0x7fe69595f68b <_ZN7ts9TimerScope9doCaptureENS_8ModeE+45> mov %al,-0x119c(%rbp) │
│ 0x7fe69595f691 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+51> lea -0x1170(%rbp),%rax │
│ 0x7fe69595f698 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+58> mov %rax,-0x11a8(%rbp) │
│ 0x7fe69595f69f <_ZN7ts9TimerScope9doCaptureENS_8ModeE+65> mov %rax,-0x11c0(%rbp) │
│ 0x7fe69595f6a6 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+72> mov 0x7b92943(%rip),%rax # 0x7fe69d4f1ff0 │
│ 0x7fe69595f6ad <_ZN7ts9TimerScope9doCaptureENS_8ModeE+79> cmpl $0x0,(%rax) │
│ 0x7fe69595f6b0 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+82> je 0x7fe69595f6c8 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+106> │
│ 0x7fe69595f6b2 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+84> mov $0x1120,%edi │
│ 0x7fe69595f6b7 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+89> call 0x7fe6952d6510 <__asan_stack_malloc_7@plt> │
│ 0x7fe69595f6bc <_ZN7ts9TimerScope9doCaptureENS_8ModeE+94> test %rax,%rax │
│ 0x7fe69595f6bf <_ZN7ts9TimerScope9doCaptureENS_8ModeE+97> je 0x7fe69595f6c8 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+106> │
│ 0x7fe69595f6c1 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+99> mov %rax,-0x11a8(%rbp) │
│ 0x7fe69595f6c8 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+106> mov -0x11a8(%rbp),%rbx │
│ 0x7fe69595f6cf <_ZN7ts9TimerScope9doCaptureENS_8ModeE+113> lea 0x1140(%rbx),%rax
In particular, following is the line that segfaults:
0x7fe69595f682 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+36> mov %rdi,-0x1198(%rbp)
What could be the logic being executed here? I can see that the value of register rbp
is 0x7fe4d390a150
and the faulting address 0x7fe4d3908fb8
can be obtained by subtracting 0x1198
from 0x7fe4d390a150
. Why would the address 0x7fe4d3908fb8
cause a segfault?
Below is the frame info:
(gdb) info frame
Stack level 0, frame at 0x7fe4d390a160:
rip = 0x7fe69595f682 in ts::TimerScope::doCapture (/tsmgr/src/TimerScope.cpp:142);
saved rip = 0x7fe69595ec93
called by frame at 0x7fe4d390a540
source language c++.
Arglist at 0x7fe4d390a150, args: this=0x0, mode=ts::Mode::None
Locals at 0x7fe4d390a150, Previous frame's sp is 0x7fe4d390a160
Saved registers:
rbx at 0x7fe4d390a128, rbp at 0x7fe4d390a150, r12 at 0x7fe4d390a130, r13 at 0x7fe4d390a138, r14 at 0x7fe4d390a140, r15 at 0x7fe4d390a148, rip at 0x7fe4d390a158
Another thing strange is if I detach the debugger at this point, the error message printed for the segfault shows a different faulting address (0x3e95c1f300086ab5):
*** Aborted at 1659061552 (Unix time, try 'date -d @1659061552') ***
*** Signal 11 (SIGSEGV) (0x3e95c1f300086ab5) received by PID 551605 (pthread TID 0x7fbe64166700) (linux TID 551964) (maybe from PID 551605, UID 1050001907) (code: -6), stack trace: ***
ASAN also reports the same address:
==551605==ERROR: AddressSanitizer: SEGV on unknown address 0x3e95c1f300086ab5 (pc 0x7fbf388386c4 bp 0x7fbe64128100 sp 0x7fbe641260d8 T358)
==551605==The signal is caused by a WRITE memory access.
Why would GDB be reporting a different faulting address than what is printed by signal handler, ASAN?
In the backtrace seen upon segfault, this
and mode
are yet to be set after the method call (hence they are showing different values from that in frame #1):
#0 0x00007fe69595f682 in ts::TimerScope::doCapture(this=0x0, mode=ts::Mode::None)
at /tsmgr/src/TimerScope.cpp:142
#1 0x00007fe69595ec93 in ts::TimerScope::switchMode(this=0x612002750d40, mode=ts::Mode::Exclusive)
at /tsmgr/src/TimerScope.cpp:132
#2 0x00007fe6993b2c2b in ts::DataTimer::switchMode(this=0x6040021ac4e0, mode=ts::Mode::Exclusive)
at /tsmgr/src/DataTimer.hpp:84
#3 0x00007fe6993c47c6 in ts::DataTimerScope::switchMode(this=0x6030037d13d0, mode=ts::Mode::Exclusive)
at /tsmgr/src/DataTimerScope.cpp:49
#4 0x00007fe698e0a02a in ts::DataEntry::changeTimerMode (this=0x7fe29ba72700, mode=ts::Mode::Exclusive)
I am using gcc/g++-10 with libasan6 support to build the application, running it on ubuntu 20.04 environment.
Have been able to provide just the snippets of code as there is a lot of other logic which would be hard to present in a sensible manner. Any pointers on how to further approach the issue would be helpful. Would keep updating the question as more information is asked for.
Edit #1: At the point of segfault, the difference between stack pointers in frame 0 and that in the base of the stack (frame 76) is 199568 bytes. The stack space size is set to 8M (default)
For the faulting address:
(gdb) p $_siginfo._sifields._sigfault.si_addr
$2 = (void *) 0x7f442a630c68
And rbp
pointing to 0x7f442a632150
Using info proc mappings
, I see following addresses that match:
0x7f4429e71000 0x7f442a631000 0x7c0000 0x0
0x7f442a631000 0x7f442a671000 0x40000 0x0
Upvotes: 3
Views: 957
Reputation: 61
I also use AddressSanitizer builds in a multithread application. I also had some cases where the AddressSanitizer builds create segmentation fault in code that ware fine. So in my case the the root of the segfaults was to small stack size on specific threads.
The AddressSanitizer builds needs sometimes up to 3x times more stack memory.
Here are all limitation for the clang compiler: https://clang.llvm.org/docs/AddressSanitizer.html#limitations
Upvotes: 4