vaibhav kumar
vaibhav kumar

Reputation: 985

Segmentation fault at the very beginning of the method

I have been trying to figure out the root cause for a segmentation fault that I see while running my application with Address Sanitizer(ASAN) enabled. When I attach GDB and debug the application, I see the segfault being received right at the beginning of the method:

Minimal code:

    int TimerScope::switchMode() {  
        doCapture(mode)
    }

>  int TimerScope::doCapture(Mode captureMode) {  <---- segfault here
       if(handle == -1)
           return 0;

       XLOG(TRACE, image(this));
        ..
    }

Note that I don't see the issue for a build without address sanitizer. I have looked at different aspects of this issue (like looking for garbage address of variables, running valgrind/UBSAN etc) without any luck. Currently I am looking into the assembly code to see if there are any clues there. With GDB, when I print the location of the segfault, this is what I get:

(gdb) p $_siginfo._sifields._sigfault.si_addr
$5 = (void *) 0x7fe4d3908fb8

The assembly code is as given below, which is executing some logic as the method TimerScope::doCapture gets called:

    0x7fe69595f65e <_ZN7ts9TimerScope9doCaptureENS_8ModeE>          endbr64                                                                                        │
│    0x7fe69595f662 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+4>        push   %rbp                                                                                    │
│    0x7fe69595f663 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+5>        mov    %rsp,%rbp                                                                               │
│    0x7fe69595f666 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+8>        push   %r15                                                                                    │
│    0x7fe69595f668 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+10>       push   %r14                                                                                    │
│    0x7fe69595f66a <_ZN7ts9TimerScope9doCaptureENS_8ModeE+12>       push   %r13                                                                                    │
│    0x7fe69595f66c <_ZN7ts9TimerScope9doCaptureENS_8ModeE+14>       push   %r12                                                                                    │
│    0x7fe69595f66e <_ZN7ts9TimerScope9doCaptureENS_8ModeE+16>       push   %rbx                                                                                    │
│    0x7fe69595f66f <_ZN7ts9TimerScope9doCaptureENS_8ModeE+17>       sub    $0x1000,%rsp                                                                            │
│    0x7fe69595f676 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+24>       orq    $0x0,(%rsp)                                                                             │
│    0x7fe69595f67b <_ZN7ts9TimerScope9doCaptureENS_8ModeE+29>       sub    $0x1a8,%rsp                                                                             │
│  > 0x7fe69595f682 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+36>       mov    %rdi,-0x1198(%rbp)                                                                      │
│    0x7fe69595f689 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+43>       mov    %esi,%eax                                                                               │
│    0x7fe69595f68b <_ZN7ts9TimerScope9doCaptureENS_8ModeE+45>       mov    %al,-0x119c(%rbp)                                                                       │
│    0x7fe69595f691 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+51>       lea    -0x1170(%rbp),%rax                                                                      │
│    0x7fe69595f698 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+58>       mov    %rax,-0x11a8(%rbp)                                                                      │
│    0x7fe69595f69f <_ZN7ts9TimerScope9doCaptureENS_8ModeE+65>       mov    %rax,-0x11c0(%rbp)                                                                      │
│    0x7fe69595f6a6 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+72>       mov    0x7b92943(%rip),%rax        # 0x7fe69d4f1ff0                                            │
│    0x7fe69595f6ad <_ZN7ts9TimerScope9doCaptureENS_8ModeE+79>       cmpl   $0x0,(%rax)                                                                             │
│    0x7fe69595f6b0 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+82>       je     0x7fe69595f6c8 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+106>                      │
│    0x7fe69595f6b2 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+84>       mov    $0x1120,%edi                                                                            │
│    0x7fe69595f6b7 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+89>       call   0x7fe6952d6510 <__asan_stack_malloc_7@plt>                                              │
│    0x7fe69595f6bc <_ZN7ts9TimerScope9doCaptureENS_8ModeE+94>       test   %rax,%rax                                                                               │
│    0x7fe69595f6bf <_ZN7ts9TimerScope9doCaptureENS_8ModeE+97>       je     0x7fe69595f6c8 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+106>                      │
│    0x7fe69595f6c1 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+99>       mov    %rax,-0x11a8(%rbp)                                                                      │
│    0x7fe69595f6c8 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+106>      mov    -0x11a8(%rbp),%rbx                                                                      │
│    0x7fe69595f6cf <_ZN7ts9TimerScope9doCaptureENS_8ModeE+113>      lea    0x1140(%rbx),%rax                                                

In particular, following is the line that segfaults:

0x7fe69595f682 <_ZN7ts9TimerScope9doCaptureENS_8ModeE+36>       mov    %rdi,-0x1198(%rbp)                                                                      

What could be the logic being executed here? I can see that the value of register rbp is 0x7fe4d390a150 and the faulting address 0x7fe4d3908fb8 can be obtained by subtracting 0x1198 from 0x7fe4d390a150. Why would the address 0x7fe4d3908fb8 cause a segfault?

Below is the frame info:

(gdb) info frame
Stack level 0, frame at 0x7fe4d390a160:
 rip = 0x7fe69595f682 in ts::TimerScope::doCapture (/tsmgr/src/TimerScope.cpp:142);
    saved rip = 0x7fe69595ec93
 called by frame at 0x7fe4d390a540
 source language c++.
 Arglist at 0x7fe4d390a150, args: this=0x0, mode=ts::Mode::None
 Locals at 0x7fe4d390a150, Previous frame's sp is 0x7fe4d390a160
 Saved registers:
  rbx at 0x7fe4d390a128, rbp at 0x7fe4d390a150, r12 at 0x7fe4d390a130, r13 at 0x7fe4d390a138, r14 at 0x7fe4d390a140, r15 at 0x7fe4d390a148, rip at 0x7fe4d390a158

Another thing strange is if I detach the debugger at this point, the error message printed for the segfault shows a different faulting address (0x3e95c1f300086ab5):

*** Aborted at 1659061552 (Unix time, try 'date -d @1659061552') ***
*** Signal 11 (SIGSEGV) (0x3e95c1f300086ab5) received by PID 551605 (pthread TID 0x7fbe64166700) (linux TID 551964) (maybe from PID 551605, UID 1050001907) (code: -6), stack trace: ***

ASAN also reports the same address:

==551605==ERROR: AddressSanitizer: SEGV on unknown address 0x3e95c1f300086ab5 (pc 0x7fbf388386c4 bp 0x7fbe64128100 sp 0x7fbe641260d8 T358)
==551605==The signal is caused by a WRITE memory access.

Why would GDB be reporting a different faulting address than what is printed by signal handler, ASAN?

In the backtrace seen upon segfault, this and mode are yet to be set after the method call (hence they are showing different values from that in frame #1):

#0  0x00007fe69595f682 in ts::TimerScope::doCapture(this=0x0, mode=ts::Mode::None)
    at /tsmgr/src/TimerScope.cpp:142
#1  0x00007fe69595ec93 in ts::TimerScope::switchMode(this=0x612002750d40, mode=ts::Mode::Exclusive)
    at /tsmgr/src/TimerScope.cpp:132
#2  0x00007fe6993b2c2b in ts::DataTimer::switchMode(this=0x6040021ac4e0, mode=ts::Mode::Exclusive)
    at /tsmgr/src/DataTimer.hpp:84
#3  0x00007fe6993c47c6 in ts::DataTimerScope::switchMode(this=0x6030037d13d0, mode=ts::Mode::Exclusive)
    at /tsmgr/src/DataTimerScope.cpp:49
#4  0x00007fe698e0a02a in ts::DataEntry::changeTimerMode (this=0x7fe29ba72700, mode=ts::Mode::Exclusive)

I am using gcc/g++-10 with libasan6 support to build the application, running it on ubuntu 20.04 environment.

Have been able to provide just the snippets of code as there is a lot of other logic which would be hard to present in a sensible manner. Any pointers on how to further approach the issue would be helpful. Would keep updating the question as more information is asked for.


Edit #1: At the point of segfault, the difference between stack pointers in frame 0 and that in the base of the stack (frame 76) is 199568 bytes. The stack space size is set to 8M (default)

For the faulting address:

(gdb) p $_siginfo._sifields._sigfault.si_addr
$2 = (void *) 0x7f442a630c68

And rbp pointing to 0x7f442a632150

Using info proc mappings, I see following addresses that match:

  0x7f4429e71000     0x7f442a631000   0x7c0000        0x0
  0x7f442a631000     0x7f442a671000    0x40000        0x0

Upvotes: 3

Views: 957

Answers (1)

KrysGo
KrysGo

Reputation: 61

I also use AddressSanitizer builds in a multithread application. I also had some cases where the AddressSanitizer builds create segmentation fault in code that ware fine. So in my case the the root of the segfaults was to small stack size on specific threads.

The AddressSanitizer builds needs sometimes up to 3x times more stack memory.

Here are all limitation for the clang compiler: https://clang.llvm.org/docs/AddressSanitizer.html#limitations

Upvotes: 4

Related Questions