jk1
jk1

Reputation: 623

If there are a way to get info at runtime about SparkMetrics configuration

I add metrics.properties file to resource directory (maven project) with CSV sinc. Everything is fine when I run Spark app locally - metrics appears. But when I file same fat jar to Amazon EMR I do not see any tries to put metrics into CSV sinc. So I want to check at runtime what are loaded settings for SparkMetrics subsystem. If there are any possibility to do this? I looked into SparkEnv.get.metricsSystem but didn't find any.

Upvotes: 0

Views: 317

Answers (1)

N_C
N_C

Reputation: 992

That is basically because Spark on EMR is not picking up your custom metrics.properties file from the resources dir of the fat jar.

For EMR the preferred way to configure is through EMR Configurations API in which you need to pass the classification and properties in an embedded JSON.

  • For spark metrics subsystem here is an example to modify a couple of metrics
  [
    {
      "Classification": "spark-metrics",
      "Properties": {
        "*.sink.csv.class": "org.apache.spark.metrics.sink.CsvSink",
        "*.sink.csv.period": "1"
      }
    }
  ]

You can use this JSON when creating EMR cluster using Amazon Console or through SDK

Upvotes: 1

Related Questions