Lalit Yadav
Lalit Yadav

Reputation: 1

Spark Streaming: What are things we should monitor to keep the streaming running?

I have a spark project running on 4 Core 16GB (both master/worker) instance, now can anyone tell me what are all the things to keep monitoring so that my cluster/jobs will never go down?

I have created a small list which includes the following items, please extend the list if you know more:

  1. Monitor Spark Master/Worker from failing
  2. Monitor HDFS from getting filled/going down
  3. Monitor network connectivity for master/worker
  4. Monitor Spark Jobs from getting killed

Upvotes: 0

Views: 646

Answers (1)

Tathagata Das
Tathagata Das

Reputation: 1808

That's a good list. But in addition to those I would actually monitor the status of the receivers of the streaming application (assuming you are some non-HDFS source of data), whether they are connected or not. Well, to be honest, this was tricky to do with older versions of Spark Streaming as the instrumentation to get the receiver status didnt quite exist. However, with Spark 1.0 (to be released very soon), you can use the org.apache.spark.streaming.StreamingListener interface to get the events regarding the status of the receiver.

A sneak peak to the to-be-released Spark 1.0 docs is at http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html

Upvotes: 1

Related Questions