Krish
Krish

Reputation: 159

Understanding the metrics approach for Spark

I was going through the metric/monitoring page for spark.

What I understand
Spark writes events to event logs to as configured in spark.eventLog.enabled true and spark.eventLog.dir hdfs://path. These are accessible in 3 ways

  1. Spark History Server
  2. REST API
  3. SparkListener

What I dont understand

  1. Can you get all event logs as written in spark.eventLog.dir in SparkListener?
  2. Do SparkMetrics just calculate metrics based on eventLogs generated or these a disparate set of metrics not visible in spark event logs?

Upvotes: 0

Views: 241

Answers (1)

Ged
Ged

Reputation: 18013

Base- and derived metrics exist 1) during a Job and 2) may be persisted to HDFS for subsequent consumption, use by e.g. History Server.

  • The Spark Listener runs during the life of the Spark App. It is a class that listens to execution events from Spark’s DAGScheduler – the main part of the execution engine in Spark. It does not consume from the spark.eventLog.dir; this may have not even been set by the spark-submit, etc.
  • 2nd question hard to follow. There are always base metrics and derived in any system. As stated they may not be coming from event logs. Given that metrics can be shown as a median, then they are also derived aka disparate I suspect.

Upvotes: 1

Related Questions