Marcin
Marcin

Reputation: 454

API for Spark Streaming Statistics

I'm looking for API which allows for accessing Spark Streaming Statistics which are available in "Streaming" tab in history server.

I'm mainly interested in batch processing time value but it's not directly available via REST API at least according to documentation: https://spark.apache.org/docs/latest/monitoring.html#rest-api

enter image description here

Any ideas how to get various information like in "Streaming" tab or running job in history server?

Upvotes: 3

Views: 1976

Answers (2)

its_a_paddo
its_a_paddo

Reputation: 102

As Spark 2.2.0 was released in july, one month after your post I guess your link refers to: spark 2.1.0. Apparently the REST API got extended for Spark Streaming, see spark 2.2.0.

So if you still got the possibility to update the Spark version, I recommend doing that. You can then receive data from all batches with the endpoint:

/applications/[app-id]/streaming/batches

Upvotes: 0

maasg
maasg

Reputation: 37435

There's a metrics endpoint available on the same port as the Spark UI on the driver node. http://<host>:<sparkUI-port>/metrics/json/

Streaming-related metrics have a .StreamingMetrics in their name:

Sample from a local test job:

local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingDelay: {
value: 30
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingStartTime: {
value: 1498124090001
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_schedulingDelay: {
value: 1
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_submissionTime: {
value: 1498124090000
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_totalDelay: {
value: 31
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingStartTime: {
value: 1498124090001
} 

To get the processing time we need to diff local-StreamingMetrics.streaming.lastCompletedBatch_processingEndTime - StreamingMetrics.streaming.lastCompletedBatch_processingStartTime

Upvotes: 2

Related Questions