Matthias Seiler
Matthias Seiler

Reputation: 43

Can't expose Flink metrics to Prometheus

I'm trying to expose the built-in metrics of Flink to Prometheus, but somehow Prometheus doesn't recognize the targets - both the JMX as well as the PrometheusReporter.

The scraping defined in prometheus.yml looks like this:

scrape_configs:
  - job_name: node
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'kafka-server'
    static_configs:
      - targets: ['localhost:7071']

  - job_name: 'flink-jmx'
    static_configs:
      - targets: ['localhost:8789']

  - job_name: 'flink-prom'
    static_configs:
      - targets: ['localhost:9249']

And my flink-conf.yml has the following lines:

#metrics.reporters: jmx, prom
metrics.reporters: jmx, prometheus

#metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 8789

metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249

However, both Flink targets are down when running a WordCount

According to the Flink docs I don't need any additional dependencies for JMX and a copy of the provided flink-metrics-prometheus-1.10.0.jar in flink/lib/ for the Prometheus reporter.

What am I doing wrong? What is missing?

Upvotes: 0

Views: 3873

Answers (1)

David Anderson
David Anderson

Reputation: 43439

That particular job is going to run to completion pretty quickly, I believe. Once you get the setup working there may be no interesting metrics because the job doesn't run long enough for anything to show up.

When you run with a mini-cluster (as java -jar ...), the flink-conf.yaml file isn't loaded (unless you've done something rather special in your job to get it loaded). Note also that this file is normally has a .yaml extension; I'm not sure if it works if .yml is used instead.

You can check the jog manager and task manager logs to make sure that the reporters are being loaded.

FWIW, the last time I did this I used this setup, so that I could scrape from multiple processes:

# flink-conf.yaml

metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260
# prometheus.yml

scrape_configs:
  - job_name: 'flink'
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']

Upvotes: 2

Related Questions