DrP3pp3r
DrP3pp3r

Reputation: 857

Linux, forked process hangs immediately

I got a problem with fork that only occurs sporadically. It works basically all the time, but fails every once in a while on a test system.

My research didn't turn up anybody else mentioning a similar problem.

The problem occurs on an embedded Linux system. There is no swap partition available.

The process running has all signals blocked in all threads and handles them via sigtimedwait in a dedicated thread.

If I start a child process via fork:

Pseudo code showing the problem:

const pid_t childPid = fork();
if(0 == childPid) {
    // child process
    LOG_MSG("Child process started."); // <- This never shows up in the syslog.

    // do some stuff

} else if(-1 == childPid) {
    // error
    LOG_MSG("Parent process: Error starting child process!");
    result = false;
} else {
    // parent process
    LOG_MSG("Parent process: Child process started. PID: %.", childPid); // <- This shows up in the syslog.

    // do some stuff
    int status = 0;
    const int options = 0;
    const auto waitResult = waitpid(childPid, &status, options);
    // more stuff
}

Questions:

  1. What could cause this hanging child process?
  2. What would happen, if the new process runs out of memory in the LOG_MSG call that leads to syslog? Would this raise a signal (that could no be delivered because it is blocked)?

Upvotes: 4

Views: 4213

Answers (1)

DrP3pp3r
DrP3pp3r

Reputation: 857

I took the sample from Adrien Descamps' link (see also comments above) and C++-ified and modified it a little:

#include <thread>
#include <iostream>
#include <atomic>

#include <unistd.h>
#include <syslog.h>
#include <sys/wait.h>


std::atomic<bool> go(true);


void syslogBlaster() {
   int j = 0;
   while(go) {
      for(int i = 0; i < 100; ++i) {
         syslog(LOG_INFO, "syslogBlaster: %d@%d", i, j);
      }
      ++j;

      std::this_thread::sleep_for(std::chrono::milliseconds(30));
   }
}

int main() {
   std::thread blaster(syslogBlaster);

   for(int i = 0; i < 1000; ++i) {
      const auto forkResult = fork();
      if(0 == forkResult) {
          syslog(LOG_INFO, "Child process: '%d'.", static_cast<int>(getpid()));
          exit(0);
      } else if(forkResult < 0) {
         std::cout << "fork() failed!" << std::endl;
      } else {
         syslog(LOG_INFO, "Parent process.");
         std::cout << "Waiting #" << i << "!" << std::endl;
         int status = 0;
         const int options = 0;
         const auto waitResult = waitpid(forkResult, &status, options);
         if(-1 == waitResult) {
             std::cout << "waitpid() failed!";
         } else {
             std::cout << "Bye zombie #" << i << "!" << std::endl;
         }
      }

      std::this_thread::sleep_for(std::chrono::milliseconds(28));
   }

   go = false;
   blaster.join();

   std::cout << "Wow, we survived!" << std::endl;
}

Running this sample, the process gets stuck (on my device) between the first and the fifth try.

Explanation

syslog is the problem!

In general: non async-signal-safe functions are the problem!

As stated by Damian Pietras (see linked page)

calling any function that is not async-safe (man 7 signal) in child process after fork() call in a multi-threaded program has undefined behaviour

Technically the problem (undefined behavior) arises from data in critical sections that is inconsistent (because a thread that is not the one forking was right in the middle of it during the fork) or - like in this case - from a mutex that was locked in the parent and then stays this way forever in the child.

Credit for this answer goes to Adrien Descamps for finding the root cause (syslog), but also to PSkocik and Jan Spurny for detecting the source (LOG_MSG).

Upvotes: 4

Related Questions