Reputation: 683

Prometheus reading metrics from an offline nodes

I configured Prometheus for monitoring different spring boot applications. These applications can have multiple instances deployed on an ensemble of five different servers. Some applications are deployed on every node, some others not. There is no way to determine if applicationOne is on nodeOne (this is in charge to Portainer), so I configured Prometheus listing as target all the possible ips on which an application can be deployed.

  - job_name: 'production-diagnostic'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
    - targets: ['1.1.1.1:9003', '1.1.1.2:9003', '1.1.1.3:9003', '1.1.1.4:9003', '1.1.1.5:9003']

This specific application is configured to run only on a server at a time, it will switch on another node only if there is a human request of redeploy. Prometheus behaves in a bad manner: it apparently reads metric from four nodes for this application, even if it is deployed only on one of them. The same happens to the others applications.

Example:

jvm_memory_used_bytes{application="localization_vehicles_diagnostic",area="heap",id="G1 Eden Space",instance="1.1.1.1:9003",job="production-diagnostic"} 299892736 jvm_memory_used_bytes{application="localization_vehicles_diagnostic",area="heap",id="G1 Eden Space",instance="1.1.1.2:9003",job="production-diagnostic"} 296747008 jvm_memory_used_bytes{application="localization_vehicles_diagnostic",area="heap",id="G1 Eden Space",instance="1.1.1.3:9003",job="production-diagnostic"} 294649856 jvm_memory_used_bytes{application="localization_vehicles_diagnostic",area="heap",id="G1 Eden Space",instance="1.1.1.4:9003",job="production-diagnostic"} 295698432

Is there something wrong in my configuration? Have I to add some other parameter? Or perhaps there is some issue with Prometheus and Portainer?

Upvotes: 1

Answers (2)

Michael Doubez

Reputation: 6863

Locating the targets to scrape is the whole point of service discovery: an out-of-band information is used for determining where is/are run service(s). In your case, you setup a static config which means all the targets provided are running services.

If you don't have a readily available service discovery system, you can fake one using file based service discovery.

Modify your job to use file discovery, indicating a file to be read by Prometheus for listing the targets

  - job_name: 'production-diagnostic'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
    file_sd_configs:
    - files:
      - 'discovered.json'

Then, regularly launch a script that polls the services (testing the url of your service) and write the discovered targets in the file. Prometheus can watch the file and reload automatically when it detects a change. It will contain by example

[ {  "targets": [ "1.1.1.1:9003"  ]  }]

NOTE: prometheus is really reactive and if your scripts takes too much time writing the file it will have transient errors because it reads an unfinished file. I solved that by writing in a temp file and then copy it to target f I let.

You can define labels in config or in the file to uniquely identify a service when it moves from server to server.

Upvotes: 1

Kamol Hasan

Reputation: 13456

This is the way, prometheus works. You set a list of targets in scrape_config. Prometheus will try to get metrics from those targets at /actuator/prometheus endpoint at a given interval no matter whether they exist or not.

You can use auto generated metric named up to isolate your required metrics from others. You can easily determine which metric sources are offline from up metric.

up{job="", instance=""}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.

Upvotes: 1

Prometheus reading metrics from an offline nodes

Answers (2)

Related Questions