Reputation: 643
I'm running Zeppelin 0.8.1 and have configured it to submit Spark jobs to a Yarn 2.7.5 cluster, with interpreters both in cluster-mode (as in the AM is running on yarn, and not on driver host), and in client-mode.
The yarn applications started in client mode are immediately killed after I stop the Zeppelin server. But, the jobs started in cluster mode become zombie-like, and start hogging all the resources in the Yarn cluster (No dynamic resource allocation).
Is there a way to make zeppelin kill those jobs upon exit? Or anything that solves this problem?
Upvotes: 1
Views: 547
Reputation: 32660
Starting from version 0.8, Zeppelin provides a parameter to shutdown idle interpreters by setting zeppelin.interpreter.lifecyclemanager.timeout.threshold
.
See Interpreter Lifecycle Management
Before this I used a simple shell script that checks the running applications on yarn and kills them if idle for more than 1 hour:
max_life_in_mins=60
zeppelinApps=`yarn application -list 2>/dev/null | grep "RUNNING" | grep "Zeppelin Spark Interpreter" | awk '{print $1}'`
for jobId in $zeppelinApps
do
finish_time=`yarn application -status $jobId 2>/dev/null | grep "Finish-Time" | awk '{print $NF}'`
if [ $finish_time -ne 0 ]; then
echo "App $jobId is not running"
exit 1
fi
time_diff=`date +%s`-`yarn application -status $jobId 2>/dev/null | grep "Start-Time" | awk '{print $NF}' | sed 's!$!/1000!'`
time_diff_in_mins=`echo "("$time_diff")/60" | bc`
if [ $time_diff_in_mins -gt $max_life_in_mins ]; then
echo "Killing app $jobId"
yarn application -kill $jobId
fi
done
There is also yarn REST API to do the same thing.
Upvotes: 3