N. Hunt
N. Hunt

Reputation: 151

Problems with ptrace(PTRACE_ME,...) and subsequent wait

I am porting a debugger, 'pi' ('process inspector') to Linux and am working on the code for fork/exec of a child to inspect it. I am following standard procedure (I believe) but the wait is hanging. 'hang' is the procedure which does the work, the 'cmd' argument being the name of the binary (a.out) to trace:

int Hostfunc::hang(char *cmd){
    char *argv[10], *cp;
    int i;
    Localproc *p;
    struct exec exec;
    struct rlimit rlim;
    
    i = strlen(cmd);
    if (++i > sizeof(procbuffer)) {
        i = sizeof(procbuffer) - 1;
        procbuffer[i] = 0;
    }
    bcopy(cmd, procbuffer, i);
    argv[0] = cp = procbuffer;
    for(i = 1;;) {
        while(*cp && *cp != ' ')
            cp++;
        if (!*cp) {
            argv[i] = 0;
            break;
        } else {
            *cp++ = 0;
            while (*cp == ' ')
                cp++;
            if (*cp)
                argv[i++] = cp;
        }
    }
    hangpid = fork();
    if (!hangpid){
        int fd, nfiles = 20;
        if(getrlimit(RLIMIT_NOFILE, &rlim))
            nfiles = rlim.rlim_cur;
        for( fd = 0; fd < nfiles; ++fd )
            close(fd);
        open("/dev/null", 2);
        dup2(0, 1);
        dup2(0, 2);
        setpgid(0, 0);
        ptrace(PTRACE_TRACEME, 0, 0, 0);
        execvp(argv[0], argv);
        exit(0);
    }
    if (hangpid < 0)
        return 0;
    p = new Localproc;
    if (!p) {
        kill(9, hangpid);
        return 0;
    }
    p->sigmsk = sigmaskinit();
    p->pid = hangpid;
    if (!procwait(p, 0)) {
        delete p;
        return 0;
    }
    if (p->state.state == UNIX_BREAKED)
        p->state.state = UNIX_HALTED;
    p->opencnt = 0;
    p->next = phead;
    phead = p;
    return hangpid;
}

I put the 'abort()' in to catch a non-zero return from ptrace, but that is not happening. The call to 'raise' seems to be a common practice but a cursory look at gdb's code reveals it is not used there. In any case it makes no difference to the outcome. `procwait' is as follows:

int Hostfunc::procwait(Localproc *p, int flag){
    int tstat;
    int cursig;

again:
    if (p->pid != waitpid(p->pid, &tstat, (flag&WAIT_POLL)? WNOHANG: 0))
        return 0;
    if (flag & WAIT_DISCARD)
        return 1;
    if (WIFSTOPPED(tstat)) {
        cursig = WSTOPSIG(tstat);
        if (cursig == SIGSTOP)
            p->state.state = UNIX_HALTED;
        else if (cursig == SIGTRAP)
            p->state.state = UNIX_BREAKED;
        else {
            if (p->state.state == UNIX_ACTIVE &&
                !(p->sigmsk&bit(cursig))) {
                ptrace(PTRACE_CONT, p->pid, 1, cursig, 0);
                goto again;
            }
            else {
                p->state.state = UNIX_PENDING;
                p->state.code = cursig;
            }
        }
    } else {
        p->state.state = UNIX_ERRORED;
        p->state.code = WEXITSTATUS(tstat) & 0xFFFF;
    }
    return 1;
}

The 'waitpid' in 'procwait' just hangs. If I run the program with the above code, and run a 'ps', I can see that 'pi' has forked but hasn't yet called exec, because the command line is still 'pi', and not the name of the binary I am forking. I discovered that if I remove the 'raise', 'pi' still hangs but 'ps' now shows that the forked program has the name of the binary being examined, which suggests it has performed the exec.

So, as far as I can see, I am following documented procedures to take control of a forked process but it isn't working.

Noel Hunt

Upvotes: 0

Views: 1573

Answers (1)

N. Hunt
N. Hunt

Reputation: 151

I have found the problem (with my own code, as Nate pointed out), but the cause was obscure until I ran 'strace pi'. It was clear from that that there was a SIGCHLD handler, and it was executing a wait. The parent enters wait, SIGCHLD is delivered, the handler waits and thus reaping the status of the child, then wait is restarted in the parent and hangs because there is no longer any change of state. The SIGCHLD handler makes sense because the pi wants to be informed of state changes in the child. The first version of 'pi' I got working was a Solaris version, and it uses /proc for process control so there was no use of 'wait' to get child status, hence I didn't see this problem in the Solaris version.

Upvotes: 2

Related Questions