Reputation: 371
I write a script to get data from HDFS parallel, then I wait these child processes in a for loop, but sometimes it returns pid is not a child of this shell
. sometimes, it works well. It's so puzzled. I use jobs -l
to show all the jobs run in the background. I am sure these pid is the child process of the shell process, and I use ps aux
to make sure these pids is note assign to other process. Here is my script.
PID=()
FILE=()
let serial=0
while read index_tar
do
echo $index_tar | grep index > /dev/null 2>&1
if [[ $? -ne 0 ]]
then
continue
fi
suffix=`printf '%03d' $serial`
mkdir input/output_$suffix
$HADOOP_HOME/bin/hadoop fs -cat $index_tar | tar zxf - -C input/output_$suffix \
&& mv input/output_$suffix/index_* input/output_$suffix/index &
PID[$serial]=$!
FILE[$serial]=$index_tar
let serial++
done < file.list
for((i=0;i<$serial;i++))
do
wait ${PID[$i]}
if [[ $? -ne 0 ]]
then
LOG "get ${FILE[$i]} failed, PID:${PID[$i]}"
exit -1
else
LOG "get ${FILE[$i]} success, PID:${PID[$i]}"
fi
done
Upvotes: 30
Views: 38399
Reputation: 1385
If you're running this in a container of some sort, the condition apparently can be caused by a bug in bash that is easier to encounter in a containerized envrionment.
From my reading of the bash source (specifically see comments around RECYCLES_PIDS
and CHILD_MAX
in bash-4.2/jobs.c
), it looks like in their effort to optimize their tracking of background jobs, they leave themselves vulnerable to PID aliasing (where a new process might obscure the status of an old one); to mitigate that, they prune their background process history (apparently as mandated by POSIX?). If you should happen to want to wait
on a pruned process, the shell can't find it in the history and assumes this to mean that it never knew about it (i.e., that it "is not a child of this shell").
Upvotes: 11
Reputation: 495
Just find the process id of the process you want to wait for and replace that with 12345 in below script. Further changes can be made as per your requirement.
#!/bin/sh
PID=12345
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running" >> /home/parv/waitAndRun.log
sleep .6
done
echo "Process $PID has finished" >> /home/parv/waitAndRun.log
/usr/bin/waitingScript.sh
http://iamparv.blogspot.in/2013/10/unix-wait-for-running-process-not-child.html
Upvotes: 33
Reputation: 393849
Either your while loop or the for loop runs in a subshell, which is why you cannot await a child of the (parent, outer) shell.
Edit this might happen if the while loop or for loop is actually
(a) in a {...}
block
(b) participating in a piper (e.g. for....done|somepipe
)
Upvotes: 8