Reputation: 454
I'm looking for API which allows for accessing Spark Streaming Statistics which are available in "Streaming" tab in history server.
I'm mainly interested in batch processing time value but it's not directly available via REST API at least according to documentation: https://spark.apache.org/docs/latest/monitoring.html#rest-api
Any ideas how to get various information like in "Streaming" tab or running job in history server?
Upvotes: 3
Views: 1976
Reputation: 102
As Spark 2.2.0 was released in july, one month after your post I guess your link refers to: spark 2.1.0. Apparently the REST API got extended for Spark Streaming, see spark 2.2.0.
So if you still got the possibility to update the Spark version, I recommend doing that. You can then receive data from all batches with the endpoint:
/applications/[app-id]/streaming/batches
Upvotes: 0
Reputation: 37435
There's a metrics endpoint available on the same port as the Spark UI on the driver node.
http://<host>:<sparkUI-port>/metrics/json/
Streaming-related metrics have a .StreamingMetrics
in their name:
Sample from a local test job:
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingDelay: {
value: 30
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingStartTime: {
value: 1498124090001
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_schedulingDelay: {
value: 1
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_submissionTime: {
value: 1498124090000
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_totalDelay: {
value: 31
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingStartTime: {
value: 1498124090001
}
To get the processing time we need to diff local-StreamingMetrics.streaming.lastCompletedBatch_processingEndTime -
StreamingMetrics.streaming.lastCompletedBatch_processingStartTime
Upvotes: 2