Reputation: 41
We are using spark on kubernetes (using the SparkOperator) and Prometheus to expose the metrics of the application. The application is a spark streaming app (NOT structured streaming).
The application used to run on an image with spark version 2.4.7 and was later migrated to spark 3.1.2.
Because of this migration all spark_streaming_*
metrics disappeared, like spark_streaming_driver_totalreceivedrecords
(as defined here https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala#L53)
In general the Prometheus setup seems to work because when you curl the prometheus port you can still see a bunch of other metrics - just none of the spark streaming metrics. The spark-image contains the prometheus-java agent and in the helm chart of the stream-app the monitoring spec is configured to use it
monitoring:
exposeDriverMetrics: true
exposeExecutorMetrics: true
prometheus:
jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
port: 8888
as described in the docu of the spark-operator https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#monitoring. This is also how the setup used to work with spark 2.4.7
Are this metrics gone in spark3? Or are we maybe just missing some configuration?
Another side note: when you check the metrics via <spark-driver>:<ui-port>/metrics/json
you can see the desired metrics
Upvotes: 1
Views: 630
Reputation: 41
ok it seems the spark-operator currently is not working properly with the current metrics in Spark3. The issue could be fixed by provisioning the spark-image with a fixed prometheus configuration file and then use it in your helm chart
monitoring:
exposeDriverMetrics: true
exposeExecutorMetrics: true
prometheus:
jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
port: 8888
configFile: PATH_TO_THE_CONFIG
more infos and a working prometheus config can be found here: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/1117
Upvotes: 1