hiddenbit
hiddenbit

Reputation: 383

SIGSEGV signal not caught by handler in multi threaded process

I am working on a Vulkan layer that intercepts all Vulkan calls and writes them in a file. When it comes to dealing with device memory mapped to user space the layer detects reads and writes from the user by protecting memory regions with mprotect.

The idea is that when the user requests to vkMapMemory some memory then the layer allocates a memory using mmap with the PROT_READ | PROT_WRITE flags set, sets up a signal handler for SIGSEGV with sigaction, mprotects it against both reading and writing and returns the base pointer to the user. Any access will trigger a SIGSEGV and the handler takes care of the rest. So far so good.

The problem that I'm dealing with is a case where there are 2 threads that allocate and access the said memory regions, and the moment one of the threads access one of said memory regions, the generated SIGSEGV is not directed to handler but instead terminates the application with segmentation fault. The other thread works as expected.

Bellow is an extract of what's going on before the crash:

[59588] AddExceptionHandler() -> sigaction SIGSEGV *******************
[59588] SetMemoryProtection() mprotect: ptr: 0x7fec0819d000 - 0x7fec0839d000 size: 2097152 mask: 0x0
[59588] MapMemory <-- *ppData: 0x7fec0819d000
[59588] PageGuardExceptionHandler()
  [59588] HandleGuardPageViolation() address: 0x7fec0819d000 is_write: 1 clear_guard: 1
  [59588] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59588] SetMemoryProtection() mprotect: ptr: 0x7fec0819d000 - 0x7fec0819e000 size: 4096 mask: 0x3
[59588] PageGuardExceptionHandler() <-- (handled: true)
[59588] MapMemory(memory: 0x5575f2d5cb20 size: 2097152)
[59588] mmap: 0x7fec013fb000 - 0x7fec015fb000 size: 2097152
[59588] SetMemoryProtection() mprotect: ptr: 0x7fec013fb000 - 0x7fec015fb000 size: 2097152 mask: 0x0
[59588] MapMemory <-- *ppData: 0x7fec013fb000
[59588] PageGuardExceptionHandler()
  [59588] HandleGuardPageViolation() address: 0x7fec013fb000 is_write: 1 clear_guard: 1
  [59588] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59588] SetMemoryProtection() mprotect: ptr: 0x7fec013fb000 - 0x7fec013fc000 size: 4096 mask: 0x3
[59588] PageGuardExceptionHandler() <-- (handled: true)
[59588] AllocateMemory(size: 4194304) *pMemory: 0x5575f2f5c080
[59588] AllocateMemory(size: 537600) *pMemory: 0x5575f2f5c710

Test case 'dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.dynamically_uniform.geometry.samplercubearray'..
[59607] AllocateMemory(size: 2097152) *pMemory: 0x7febe80016c0
[59607] MapMemory(memory: 0x7febe80016c0 size: 2097152)
[59607] mmap: 0x7fec007fa000 - 0x7fec009fa000 size: 2097152
[59588] PageGuardExceptionHandler()
  [59588] HandleGuardPageViolation() address: 0x7fec0143b9e0 is_write: 1 clear_guard: 1
  [59588] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59588] SetMemoryProtection() mprotect: ptr: 0x7fec0143b000 - 0x7fec0143c000 size: 4096 mask: 0x3
[59588] PageGuardExceptionHandler() <-- (handled: true)
[59588] PageGuardExceptionHandler()
  [59588] HandleGuardPageViolation() address: 0x7fec0829d000 is_write: 1 clear_guard: 1
  [59588] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59588] SetMemoryProtection() mprotect: ptr: 0x7fec0829d000 - 0x7fec0829e000 size: 4096 mask: 0x3
[59588] PageGuardExceptionHandler() <-- (handled: true)
[59607] SetMemoryProtection() mprotect: ptr: 0x7fec007fa000 - 0x7fec009fa000 size: 2097152 mask: 0x0
[59607] MapMemory <-- *ppData: 0x7fec007fa000
[59607] util_copy_rect() 1 src: 0x5575f2e6c468 dst: 0x7fec007fa000 size: 4
Segmentation fault (core dumped)

So at the end it is visible that the second thread enters the game, requests one of said memory regions, it is interrupted by the first thread's signal handler for SIGSEGVs that take place in the 1st thread, then continues mprotecting the new region, accesses it but the handler is not called.

A slightly different order of things before the crash takes places when running with Valgrind:

[59840] PageGuardExceptionHandler()
  [59840] HandleGuardPageViolation() address: 0x14e08000 is_write: 1 clear_guard: 1
  [59840] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59840] SetMemoryProtection() mprotect: ptr: 0x14e08000 - 0x14e09000 size: 4096 mask: 0x3
[59840] PageGuardExceptionHandler() <-- (handled: true)
[59840] PageGuardExceptionHandler()
  [59840] HandleGuardPageViolation() address: 0x14ac8000 is_write: 1 clear_guard: 1
  [59840] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59840] SetMemoryProtection() mprotect: ptr: 0x14ac8000 - 0x14ac9000 size: 4096 mask: 0x3
[59840] PageGuardExceptionHandler() <-- (handled: true)
[59881] AllocateMemory(size: 2097152) *pMemory: 0xe695c80
[59881] MapMemory(memory: 0xe695c80 size: 2097152)
[59881] mmap: 0x161c9000 - 0x163c9000 size: 2097152
[59881] SetMemoryProtection() mprotect: ptr: 0x161c9000 - 0x163c9000 size: 2097152 mask: 0x0
[59881] MapMemory <-- *ppData: 0x161c9000
[59881] util_copy_rect() 1 src: 0xec7a0d8 dst: 0x161c9000 size: 4
==59840== 
==59840== Process terminating with default action of signal 11 (SIGSEGV)
==59840==  Bad permissions for mapped region at address 0x161C9000
==59840==    at 0x4842B33: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==59840==    by 0x5EFB4CE: util_copy_box (u_surface.c:78)
==59840==    by 0x611CA0B: u_default_texture_subdata (u_transfer.c:71)
==59840==    by 0x5F05BCC: tc_call_texture_subdata (u_threaded_context.c:2529)
==59840==    by 0x5F01031: tc_batch_execute (u_threaded_context.c:213)
==59840==    by 0x584334B: util_queue_thread_func (u_queue.c:313)
==59840==    by 0x5842F5A: impl_thrd_routine (threads_posix.h:87)
==59840==    by 0x49A7608: start_thread (pthread_create.c:477)
==59840==    by 0x4E51132: clone (clone.S:95)

The only difference in this case is that the second thread is not interrupted.

Running the process with taskset 1, which essentially makes everything sequential makes the problem to go away.

I don't understand why the second thread's SIGSEGV is not being caught by the handler

Edit: An important thing I forgot to mention is that the mesa library is also in the picture. The logs are from a gles application (cts test) running on zink which translates it into Vulkan calls. The multiple threads are generated, if I'm not mistaken, by mesa.

Upvotes: 2

Views: 469

Answers (1)

hiddenbit
hiddenbit

Reputation: 383

Kinda figured out my problem. Somehow SIGSEGV gets blocked. I'm not sure how this happens as setting a breakpoint with gdb on sigprocmask never triggers.

Edit: It is blocked with pthread_sigmask

Re-enabling it each time with:

sigset_t x;
sigemptyset(&x);
sigaddset(&x, SIGSEGV);
sigprocmask(SIG_UNBLOCK, &x, NULL);

solves my issue (and possibly creates another one).

Upvotes: 1

Related Questions