Reputation: 2393
I'm working on a shell lab for a Systems course, and I've been having some really weird race condition bugs that I've been trying to solve since Friday night and can't seem to pin down.
My current code: http://buu700.com/tsh
Everything before START OF MY CODE
and after END OF MY CODE
is provided by the course instructors, so none of that should be the source of the issues.
We have a test script as well; here is the output of my current test results: http://buu700.com/sdriver
/*****************
* Signal handlers
*****************/
/*
* sigchld_handler - The kernel sends a SIGCHLD to the shell whenever
* a child job terminates (becomes a zombie), or stops because it
* received a SIGSTOP, SIGTSTP, SIGTTIN or SIGTTOU signal. The
* handler reaps all available zombie children, but doesn't wait
* for any other currently running children to terminate.
*/
void
sigchld_handler(int sig)
{
pid_t pid;
int status, termsig;
struct job_t *job;
sigset_t s;
sigemptyset(&s);
sigaddset(&s, SIGCHLD);
sigaddset(&s, SIGINT);
sigaddset(&s, SIGTSTP);
sigprocmask(SIG_BLOCK, &s, NULL);
while ((pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) {
if (WIFEXITED(status)) {
deletejob(job_list, pid);
}
if ((termsig = WTERMSIG(status))) {
deletejob(job_list, pid);
safe_printf("Job [%i] (%i) %s by signal %i\n",
pid2jid(pid), pid, "terminated", termsig);
}
if (WIFSTOPPED(status)) {
job = getjobpid(job_list, pid);
job->state = ST;
safe_printf("Job [%i] (%i) %s by signal %i\n",
pid2jid(pid), pid, "stopped", SIGTSTP);
}
}
if (errno != ECHILD)
unix_error("waitpid error");
sigprocmask(SIG_UNBLOCK, &s, NULL);
return;
}
/*
* sigint_handler - The kernel sends a SIGINT to the shell whenver the
* user types ctrl-c at the keyboard. Catch it and send it along
* to the foreground job.
*/
void
sigint_handler(int sig)
{
sigset_t s;
sigemptyset(&s);
sigaddset(&s, SIGCHLD);
sigaddset(&s, SIGINT);
sigaddset(&s, SIGTSTP);
sigprocmask(SIG_BLOCK, &s, NULL);
kill(-1, sig);
sigprocmask(SIG_UNBLOCK, &s, NULL);
return;
}
/*
* sigtstp_handler - The kernel sends a SIGTSTP to the shell whenever
* the user types ctrl-z at the keyboard. Catch it and suspend the
* foreground job by sending it a SIGTSTP.
*/
void
sigtstp_handler(int sig)
{
sigset_t s;
sigemptyset(&s);
sigaddset(&s, SIGCHLD);
sigaddset(&s, SIGINT);
sigaddset(&s, SIGTSTP);
sigprocmask(SIG_BLOCK, &s, NULL);
kill(-1, sig);
sigprocmask(SIG_UNBLOCK, &s, NULL);
return;
}
Upvotes: 1
Views: 2796
Reputation: 104040
I have a suspicion that the bug comes from your racing against other signals in your signal handlers:
sigint_handler(int sig)
{
sigset_t s;
sigemptyset(&s);
sigaddset(&s, SIGCHLD);
sigaddset(&s, SIGINT);
sigaddset(&s, SIGTSTP);
sigprocmask(SIG_BLOCK, &s, NULL);
When you register the signal handler with sigaction(2)
, you can provide an sa_mask
that the kernel will use to block signals for you while your signal handler is running. This is done atomically and requires no extra work on your part. Simply populate this mask once when registering the signal handler.
Another possibility comes from the SIGTSTP
signal; page 350 of my copy of APUE, 2nd edition says, in part:
Only a job-control shell should reset the disposition of [
SIGTSTP
,SIGTTIN
,SIGTTOU
] toSIG_DFL
.
What's left unsaid in those paragraphs, but I think is fair to assume, is that the shell should set the signal disposition to SIG_DFL
for the child processes -- it still needs to do something to handle the signals itself.
Did you properly handle the signal dispositions in the children?
Upvotes: 1