Reputation: 857
I got a problem with fork that only occurs sporadically. It works basically all the time, but fails every once in a while on a test system.
My research didn't turn up anybody else mentioning a similar problem.
The problem occurs on an embedded Linux system. There is no swap partition available.
The process running has all signals blocked in all threads and handles them via sigtimedwait in a dedicated thread.
If I start a child process via fork:
Pseudo code showing the problem:
const pid_t childPid = fork();
if(0 == childPid) {
// child process
LOG_MSG("Child process started."); // <- This never shows up in the syslog.
// do some stuff
} else if(-1 == childPid) {
// error
LOG_MSG("Parent process: Error starting child process!");
result = false;
} else {
// parent process
LOG_MSG("Parent process: Child process started. PID: %.", childPid); // <- This shows up in the syslog.
// do some stuff
int status = 0;
const int options = 0;
const auto waitResult = waitpid(childPid, &status, options);
// more stuff
}
Questions:
Upvotes: 4
Views: 4213
Reputation: 857
I took the sample from Adrien Descamps' link (see also comments above) and C++-ified and modified it a little:
#include <thread>
#include <iostream>
#include <atomic>
#include <unistd.h>
#include <syslog.h>
#include <sys/wait.h>
std::atomic<bool> go(true);
void syslogBlaster() {
int j = 0;
while(go) {
for(int i = 0; i < 100; ++i) {
syslog(LOG_INFO, "syslogBlaster: %d@%d", i, j);
}
++j;
std::this_thread::sleep_for(std::chrono::milliseconds(30));
}
}
int main() {
std::thread blaster(syslogBlaster);
for(int i = 0; i < 1000; ++i) {
const auto forkResult = fork();
if(0 == forkResult) {
syslog(LOG_INFO, "Child process: '%d'.", static_cast<int>(getpid()));
exit(0);
} else if(forkResult < 0) {
std::cout << "fork() failed!" << std::endl;
} else {
syslog(LOG_INFO, "Parent process.");
std::cout << "Waiting #" << i << "!" << std::endl;
int status = 0;
const int options = 0;
const auto waitResult = waitpid(forkResult, &status, options);
if(-1 == waitResult) {
std::cout << "waitpid() failed!";
} else {
std::cout << "Bye zombie #" << i << "!" << std::endl;
}
}
std::this_thread::sleep_for(std::chrono::milliseconds(28));
}
go = false;
blaster.join();
std::cout << "Wow, we survived!" << std::endl;
}
Running this sample, the process gets stuck (on my device) between the first and the fifth try.
Explanation
syslog is the problem!
In general: non async-signal-safe functions are the problem!
As stated by Damian Pietras (see linked page)
calling any function that is not async-safe (man 7 signal) in child process after fork() call in a multi-threaded program has undefined behaviour
Technically the problem (undefined behavior) arises from data in critical sections that is inconsistent (because a thread that is not the one forking was right in the middle of it during the fork) or - like in this case - from a mutex that was locked in the parent and then stays this way forever in the child.
Credit for this answer goes to Adrien Descamps for finding the root cause (syslog), but also to PSkocik and Jan Spurny for detecting the source (LOG_MSG).
Upvotes: 4