gcq0409
gcq0409

Reputation: 371

wait child process but get error: 'pid is not a child of this shell'

I write a script to get data from HDFS parallel, then I wait these child processes in a for loop, but sometimes it returns pid is not a child of this shell. sometimes, it works well. It's so puzzled. I use jobs -l to show all the jobs run in the background. I am sure these pid is the child process of the shell process, and I use ps aux to make sure these pids is note assign to other process. Here is my script.

PID=()
FILE=()
let serial=0

while read index_tar
do
        echo $index_tar | grep index > /dev/null 2>&1

        if [[ $? -ne 0 ]]
        then
                continue
        fi

        suffix=`printf '%03d' $serial`
        mkdir input/output_$suffix
        $HADOOP_HOME/bin/hadoop fs -cat $index_tar | tar zxf - -C input/output_$suffix \
                && mv input/output_$suffix/index_* input/output_$suffix/index &

        PID[$serial]=$!
        FILE[$serial]=$index_tar

        let serial++

done < file.list

for((i=0;i<$serial;i++))
do
        wait ${PID[$i]}

        if [[ $? -ne 0 ]]
        then
                LOG "get ${FILE[$i]} failed, PID:${PID[$i]}"
                exit -1
        else
                LOG "get ${FILE[$i]} success, PID:${PID[$i]}"
        fi
done

Upvotes: 30

Views: 38399

Answers (3)

jhfrontz
jhfrontz

Reputation: 1385

If you're running this in a container of some sort, the condition apparently can be caused by a bug in bash that is easier to encounter in a containerized envrionment.

From my reading of the bash source (specifically see comments around RECYCLES_PIDS and CHILD_MAX in bash-4.2/jobs.c), it looks like in their effort to optimize their tracking of background jobs, they leave themselves vulnerable to PID aliasing (where a new process might obscure the status of an old one); to mitigate that, they prune their background process history (apparently as mandated by POSIX?). If you should happen to want to wait on a pruned process, the shell can't find it in the history and assumes this to mean that it never knew about it (i.e., that it "is not a child of this shell").

Upvotes: 11

Parvinder Singh
Parvinder Singh

Reputation: 495

Just find the process id of the process you want to wait for and replace that with 12345 in below script. Further changes can be made as per your requirement.

#!/bin/sh
PID=12345
while [ -e /proc/$PID ]
do
    echo "Process: $PID is still running" >> /home/parv/waitAndRun.log
    sleep .6
done
echo "Process $PID has finished" >> /home/parv/waitAndRun.log

/usr/bin/waitingScript.sh

http://iamparv.blogspot.in/2013/10/unix-wait-for-running-process-not-child.html

Upvotes: 33

sehe
sehe

Reputation: 393849

Either your while loop or the for loop runs in a subshell, which is why you cannot await a child of the (parent, outer) shell.

Edit this might happen if the while loop or for loop is actually

(a) in a {...} block (b) participating in a piper (e.g. for....done|somepipe)

Upvotes: 8

Related Questions