Dolda2000
Dolda2000

Reputation: 25855

Detaching Gdb without resuming the inferior

Gdb, like any other program, isn't perfect, and every now and then I encounter bugs that render the current Gdb instance unusable. At this point, if I have a debugging session with a lot of valuable state in the inferior, I'd like to be able to just start a new Gdb session on it. That is, detach, quit Gdb and start a new Gdb instance to restart where I left off.

However, when detaching Gdb, it resumes the inferior so that it continues running where it was, which ruins the point of the whole exercise. Therefore, I'm wondering if it's possible to detach in such a state that the inferior is as if it had been sent a SIGSTOP, basically.

I've tried simply killing Gdb, but interestingly, that seems to take the inferior with it. Not sure how that works.

Upvotes: 2

Views: 2165

Answers (1)

Employed Russian
Employed Russian

Reputation: 213799

when detaching Gdb, it resumes the inferior

GDB doesn't, the kernel does (assuming Linux).

I've tried simply killing Gdb, but interestingly, that seems to take the inferior with it

The kernel sends it SIGHUP, which normally kills the inferior. You can prevent that with either SIG_IGN in the inferior, or simply (gdb) call signal(1, 1).

After that, you can detach and quit GDB, but the kernel will resume the inferior with SIGCONT (see Update below), so you are back to square one.

However, there is a solution. Consider the following program:

int main()
{
  while (1) {
    printf("."); fflush(0); sleep(1);
  }
}

gdb -q ./a.out
(gdb) run
Starting program: /tmp/a.out 
.....^C
Program received signal SIGINT, Interrupt.
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81  ../sysdeps/unix/syscall-template.S: No such file or directory.

We want the program to not run away on detach, so we send it SIGSTOP:

(gdb) signal SIGSTOP
Continuing with signal SIGSTOP.

Program received signal SIGSTOP, Stopped (signal).
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81  in ../sysdeps/unix/syscall-template.S
(gdb) detach
Detaching from program: /tmp/a.out, process 25382

Note that at this point, gdb is detached (but still alive), and the program is not running (stopped).

Now in a different terminal:

gdb -q -ex 'set prompt (gdb2) ' -p 25382
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81  ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb2) c
Continuing.

Program received signal SIGSTOP, Stopped (signal).
0x00007ffff7ad5de0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:81
81  in ../sysdeps/unix/syscall-template.S
(gdb2) sig 0
Continuing with no signal.

The program continues running, printing dots in the first terminal.

Update:

SIGHUP -- Interesting. By what mechanism, though?

Good question. I didn't know, but this appears to be the answer:

From setpgid man page:

If the exit of the process causes a process group to become orphaned,
and if any member of the newly orphaned process group is stopped,
then a SIGHUP signal followed by a SIGCONT signal will be sent to
each process in the newly orphaned process group.

I have verified that if I detach and quit GDB without stopping the inferior, it doesn't get SIGHUP and continues running without dying.

If I do send it SIGSTOP and arrange for SIGHUP to be ignored, then I see both SIGHUP and SIGCONT being sent in strace, so that matches the man page exactly:

(gdb) detach
Detaching from program: /tmp/a.out, process 41699

In another window: strace -p 41699. Back to GDB:

(gdb) quit

strace output:

--- stopped by SIGSTOP ---
--- SIGHUP {si_signo=SIGHUP, si_code=SI_KERNEL} ---
--- SIGCONT {si_signo=SIGCONT, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = 0
write(1, ".", 1.)                        = 1
...

Upvotes: 3

Related Questions