Cherry
Cherry

Reputation: 33544

How does spark report / collect metrics

This defines several sinks, metrics and so on. But they are collected?

  1. Let say that I added JxmSink into metric.properties file and enable all instance metrics (master, applications, worker, executor, driver, shuffleService, applicationMaster).
  2. Let say that jmx port is set.

Where to collect metrics: should I connect to all cluster nodes or only to driver node?

Upvotes: 1

Views: 1026

Answers (1)

Sivasonai
Sivasonai

Reputation: 71

Spark metrics are not required to pull from individual nodes, if respective sink host configured in metric properties file, then metrics will be pushed to it for every configured seconds. Our setup configured to have GraphiteSink to collect the metrics, required configuration for the same as detailed below (along with others you mentioned)

  1. Prepare a metrics configuration properties file with Graphite server end point
    *.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
    *.sink.graphite.host=<graphite-server-host>
    *.sink.graphite.port=<graphite-server-port>
    *.sink.graphite.period=10
    *.sink.graphite.prefix=dev
  1. Make sure metrics properties file being passed into --files option in spark-submit job script, so that it will be used by executor nodes for sending metrics

Upvotes: 1

Related Questions