Reputation: 8885
I have been experimenting and googling for many hours, with no luck.
I have a spark streaming app that runs fine in a local spark cluster. Now I need to deploy it on cloudera 5.4.4. I need to be able to start it, have it run in the background continually, and be able to stop it.
I tried this:
$ spark-submit --master yarn-cluster --class MyMain my.jar myArgs
But it just prints these lines endlessly.
15/07/28 17:58:18 INFO Client: Application report for application_1438092860895_0012 (state: RUNNING)
15/07/28 17:58:19 INFO Client: Application report for application_1438092860895_0012 (state: RUNNING)
Question number 1: since it is a streaming app, it needs to run continuously. So how do I run it in a "background" mode? All the examples I can find of submitting spark jobs on yarn seem to assume that the application will do some work and terminate, and therefore that you would want to run it in the foreground. But that is not the case for streaming.
Next up... at this point the app does not seem to be functioning. I figure it could be a bug or misconfiguration on my part, so I tried to look in the logs to see what's happening:
$ yarn logs -applicationId application_1438092860895_012
But it tells me :
/tmp/logs/hdfs/logs/application_1438092860895_0012does not have any log files.
So question number 2: If the application is RUNNING, why does it have no log files?
So eventually I just had to kill it:
$ yarn application -kill application_1438092860895_012
That brings up question number 3: assuming I can eventually get the app launched and running in the background, is "yarn application -kill" the preferred way of stopping it?
Upvotes: 16
Views: 14582
Reputation: 5891
The last puzzle element is how to stop Spark Streaming application deployed on YARN in a graceful way. The standard method for stopping (or rather killing) YARN application is using a command yarn application -kill [applicationId]
. And this command stops the Spark Streaming application but this could happen in the middle of a batch. So if the job reads data from Kafka, saves processing results on HDFS and finally commits Kafka offsets you should expect duplicated data on HDFS when job was stopped just before committing offsets.
The first attempt to solve graceful shutdown issue was to call Spark streaming context stop method in a shutdown hook.
sys.addShutdownHook {
streamingContext.stop(stopSparkContext = true, stopGracefully = true)
}
Disappointingly a shutdown hook is called too late to finish started batch and Spark application is killed almost immediately. Moreover there is no guarantee that a shutdown hook will be called by JVM at all.
At the time of writing this blog post the only confirmed way to shutdown gracefully Spark Streaming application on YARN is to notifying somehow the application about planned shutdown, and then stop streaming context programmatically (but not from shutdown hook). Command yarn application -kill
should be used only as a last resort if notified application did not stop after defined timeout.
The application can be notified about planned shutdown using marker file on HDFS (the easiest way), or using simple Socket/HTTP endpoint exposed on the driver (sophisticated way).
Because I like KISS principle, below you can find shell script pseudo-code for starting / stopping Spark Streaming application using marker file:
start() {
hdfs dfs -touchz /path/to/marker/my_job_unique_name
spark-submit ...
}
stop() {
hdfs dfs -rm /path/to/marker/my_job_unique_name
force_kill=true
application_id=$(yarn application -list | grep -oe "application_[0-9]*_[0-9]*"`)
for i in `seq 1 10`; do
application_status=$(yarn application -status ${application_id} | grep "State : \(RUNNING\|ACCEPTED\)")
if [ -n "$application_status" ]; then
sleep 60s
else
force_kill=false
break
fi
done
$force_kill && yarn application -kill ${application_id}
}
In the Spark Streaming application, background thread should monitor marker file, and when the file disappears stop the context calling
streamingContext.stop(stopSparkContext = true, stopGracefully = true)
Also you can refer http://blog.parseconsulting.com/2017/02/how-to-shutdown-spark-streaming-job.html
Upvotes: 0
Reputation: 1033
I finally figure a way to safely close spark streaming job.
package xxx.xxx.xxx import java.io.{BufferedReader, InputStreamReader} import java.net.{ServerSocket, Socket} import org.apache.spark.streaming.StreamingContext object KillServer { class NetworkService(port: Int, ssc: StreamingContext) extends Runnable { val serverSocket = new ServerSocket(port) def run() { Thread.currentThread().setName("Zhuangdy | Waiting for graceful stop at port " + port) while (true) { val socket = serverSocket.accept() (new Handler(socket, ssc)).run() } } } class Handler(socket: Socket, ssc: StreamingContext) extends Runnable { def run() { val reader = new InputStreamReader(socket.getInputStream) val br = new BufferedReader(reader) if (br.readLine() == "kill") { ssc.stop(true, true) } br.close(); } } def run(port:Int, ssc: StreamingContext): Unit ={ (new NetworkService(port, ssc)).run } }
at your main
method where you start streaming context, add following code
ssc.start() KillServer.run(11212, ssc) ssc.awaitTermination()
Write spark-submit to submit jobs to yarn, and direct output to a file which you will use later
spark-submit --class "com.Mainclass" \ --conf "spark.streaming.stopGracefullyOnShutdown=true" \ --master yarn-cluster --queue "root" \ --deploy-mode cluster \ --executor-cores 4 --num-executors 8 --executor-memory 3G \ hdfs:///xxx.jar > output 2>&1 &
#!/bin/bash driver=`cat output | grep ApplicationMaster | grep -Po '\d+.\d+.\d+.\d+'` echo "kill" | nc $driver 11212 driverid=`yarn application -list 2>&1 | grep ad.Stat | grep -Po 'application_\d+_\d+'` yarn application -kill $driverid
Upvotes: 2
Reputation: 2406
spark-submit
console. The job is running in background already when writes out RUNNING state.yarn application -kill
is probably the best way how to stop Spark streaming application, but it's not perfect. It would be better to do some graceful shutdown to stop all stream receivers and stop streaming context, but I personally don't know how to do it.Upvotes: 8
Reputation: 324
Upvotes: 1