juejmt
juejmt

Reputation: 41

Spark streaming: expose spark_streaming_* metrics

We are using spark on kubernetes (using the SparkOperator) and Prometheus to expose the metrics of the application. The application is a spark streaming app (NOT structured streaming). The application used to run on an image with spark version 2.4.7 and was later migrated to spark 3.1.2. Because of this migration all spark_streaming_* metrics disappeared, like spark_streaming_driver_totalreceivedrecords (as defined here https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala#L53)

In general the Prometheus setup seems to work because when you curl the prometheus port you can still see a bunch of other metrics - just none of the spark streaming metrics. The spark-image contains the prometheus-java agent and in the helm chart of the stream-app the monitoring spec is configured to use it

monitoring:
    exposeDriverMetrics: true
    exposeExecutorMetrics: true
    prometheus:
      jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
      port: 8888

as described in the docu of the spark-operator https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#monitoring. This is also how the setup used to work with spark 2.4.7

Are this metrics gone in spark3? Or are we maybe just missing some configuration? Another side note: when you check the metrics via <spark-driver>:<ui-port>/metrics/json you can see the desired metrics

Upvotes: 1

Views: 630

Answers (1)

juejmt
juejmt

Reputation: 41

ok it seems the spark-operator currently is not working properly with the current metrics in Spark3. The issue could be fixed by provisioning the spark-image with a fixed prometheus configuration file and then use it in your helm chart

monitoring:
    exposeDriverMetrics: true
    exposeExecutorMetrics: true
    prometheus:
      jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
      port: 8888
      configFile: PATH_TO_THE_CONFIG

more infos and a working prometheus config can be found here: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1117

Upvotes: 1

Related Questions