osgx
osgx

Reputation: 94165

ptrace'ing multithread application

I have a "debugger"-like application, named hyper-ptrace. It starts user_appl3 which is multithreaded with NPTL.

Main loop of hyper-ptrace is:

wait3(&status, FLAGS, &u);
// find a pid of child, which has a signal
switch (signal = WSTOPSIG(status))
{
  case SIGTRAP:
    do_some_analysis_of_the_child(pid, &status) // up to several ms
    break;
}
ptrace(PTRACE_CONT, pid); // discard signal, user_appl3 doesn't know anything 
                          //about this SIGTRAP

The SIGTRAP is generated for user_appl3 by hardware at some periodic interval for each thread and it is delivered to some of thread. Interval can be 100..1 ms or even less. It is a sort of per-CPU clocks with interrupts. Each threads runs on only its CPU (binded with affinity).

So there is the question1:

If thread1 got TRAP and debugger enters to do_some_analysis_of_the_child, (so debugger does not do a wait3 for second thread), and a bit time later thread2 gots TRAP too, what will be done by Linux kernel?

In my opinion: thread1 will be stopped because its get a signal and there is a waiting debugger. But thread2 continues to run (is it?). When thread2 gets a signal, there will be no a waiting debugger, so TRAP can be delivered to the thread2 itself, effectively killing it. Am I right?

And there is the second question, question2:

For this case, how should I rewrite the main loop of hyper-ptrace to lower the chances of delivering signal through to the user's thread, over the debugger? Nor trap-generating hardware neither user application can't be changed. Stopping the second thread is not a variant too.

I need analysis of both threads. Some it parts can be done only when thread is stopped.

Thanks in advance!

Upvotes: 3

Views: 2779

Answers (1)

caf
caf

Reputation: 239011

No, the signal isn't delivered to the application. The child application will stop when the signal happens, and your ptracing process will be notified about it next time it calls wait().

You're right - the tracing stop only applies to the main thread.

To get the behaviour you want, suspend the entire child process (every thread) immediately after the traced thread stops by sending a SIGSTOP to the process PID, and resume it with a SIGCONT when you're done:

wait3(&status, FLAGS, &u);

if (WIFSTOPPED(status))
    kill(pid, SIGSTOP);  /* Signal entire child process to stop */

switch (signal = WSTOPSIG(status))
{
  case SIGTRAP:
    do_some_analysis_of_the_child(pid, &status) // up to several ms
    break;
}

ptrace(PTRACE_CONT, pid, 0, 0); // discard signal, user_appl3 doesn't know anything about this SIGTRAP
kill(pid, SIGCONT);  /* Signal entire child process to resume */

Upvotes: 6

Related Questions