Reputation: 181
for (; 1;) {
if (fork() == 0) break;
int sig = 0;
for (; 1; usleep(10000)) {
pid_t wpid = waitpid(g->pid[1], &sig, WNOHANG);
if (wpid > 0) break;
if (wpid < 0) print("wait error: %s\n", strerror(errno));
}
}
waitpid
should return the pid of child process immediately!
But waitpid
got the pid number after about 90 seconds,
cube 28139 0.0 0.0 70576 900 ? Ss 04:24 0:07 ./daemon -d
cube 28140 9.3 0.0 0 0 ? Zl 04:24 106:19 [daemon] <defunct>
strace -p 28139
Process 28139 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
wait4(28140, 0x7fff08a2681c, WNOHANG, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0
wait4(28140, 0x7fff08a2681c, WNOHANG, NULL) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
restart_syscall(<... resuming interrupted call ...>) = 0
wait4(28140, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WNOHANG, NULL) = 28140
Upvotes: 5
Views: 6850
Reputation: 181
I finally find out there were some fd leaks during deep tracing by lsof.
After fd leaks were fixed, the problem was gone.
Upvotes: 3
Reputation: 1418
It looks to me like waitpid is not returning the child pid immediately simply because that process is not available.
Furthermore, it looks like you actually want your code to do this because you specify waitpid()
with the NOHANG
option, which, prevents blocking, essentially allowing the parent to move on if the child pid is not available.
Maybe your process using something you didn't expect? Can you trace its activity to see if you find the bottleneck?
Here is a pretty useful link that might help you: http://infohost.nmt.edu/~eweiss/222_book/222_book/0201433079/ch08lev1sec6.html
Upvotes: 1
Reputation: 15121
You could simply use
for (;;) {
pid_t wpid = waitpid(-1, &sig, 0);
if (wpid > 0) break;
if (wpid < 0) print("wait error: %s\n", strerror(errno));
}
instead of sleep for a while and try again.
Upvotes: 1