maopuppets
maopuppets

Reputation: 470

Prometheus blackbox probe helpful metrics

I have around 1000 targets that are probed using HTTP.

job="http_2xx", env="prod", instance="x.x.x.x"
job="http_2xx", env="test", instance="y.y.y.y"
job="http_2xx", env="dev", instance="z.z.z.z"

I want to know for the targets:

  1. Rate of failure by env in last 10 minutes.
  2. Increase in rate of failure by env in last 10 minutes.
  3. Curious what the following does:
sum(increase(probe_success{job="http_2xx"}[10m]))

rate(probe_success{job="http_2xx", env="prod"}[5m]) * 100

The closest I have reached is with following to find operational by env in 10 minutes:

avg(avg_over_time(probe_success{job="http_2xx", env="prod"}[10m]) * 100)

Upvotes: 1

Views: 1929

Answers (1)

Petar Nikolov
Petar Nikolov

Reputation: 321

  1. Rate of failure by env in last 10 minutes. The easiest way you can do it is:

    sum(rate(probe_success{job="http_2xx"}[10m]) * 100) by (env)

    This will return you the percentage off successful probes, which you can reverse adding *(-1) +100

  2. Calculating rate over 10m and increase of rate over 10m seems redundant adding an increase function to the above query didn't work for me. you can replace the rate function with increase if want to.

  3. The first query was pretty close it will calculate the increase of successful probes over 10m period. You can make it show increase of failed probes by adding == 0 and sum it by the "env" variable

    sum(increase(probe_success{job="http_2xx"} == 0 [10m])) by (env)

    Your second query will return percentage of successful request over 5m for prod environment

Upvotes: 1

Related Questions