Reputation: 3333
I have the service (named sdk-backend) written in Scala + Cats Effect.
Kamon.io is used to publish response time metrics to Prometheus.
There is API for token retrieval.
If I make two API calls, there are such metrics tracked:
# TYPE api_v1_sdk_token_seconds histogram
api_v1_sdk_token_seconds_bucket{le="0.005"} 0.0
api_v1_sdk_token_seconds_bucket{le="0.01"} 0.0
api_v1_sdk_token_seconds_bucket{le="0.025"} 0.0
api_v1_sdk_token_seconds_bucket{le="0.05"} 0.0
api_v1_sdk_token_seconds_bucket{le="0.075"} 0.0
api_v1_sdk_token_seconds_bucket{le="0.1"} 0.0
api_v1_sdk_token_seconds_bucket{le="0.25"} 0.0
api_v1_sdk_token_seconds_bucket{le="0.5"} 0.0
api_v1_sdk_token_seconds_bucket{le="0.75"} 0.0
api_v1_sdk_token_seconds_bucket{le="1.0"} 0.0
api_v1_sdk_token_seconds_bucket{le="2.5"} 0.0
api_v1_sdk_token_seconds_bucket{le="5.0"} 0.0
api_v1_sdk_token_seconds_bucket{le="7.5"} 1.0
api_v1_sdk_token_seconds_bucket{le="10.0"} 2.0
api_v1_sdk_token_seconds_bucket{le="+Inf"} 2.0
api_v1_sdk_token_seconds_count 2.0
api_v1_sdk_token_seconds_sum 16.978542592
api_v1_sdk_token_seconds_count
means, there were 2 requests to the API, which took 16.97 sec (api_v1_sdk_token_seconds_sum
) in sum (yes API is quite slow).
The metrics are published in Prometheus without issues.
Then I'd like to import the metrics into Grafana.
The expression I'm using to show response time over time is as follows:
avg by(app) (sum by(app) (increase(api_v1_sdk_token_seconds_sum{app="sdk-backend"}[$__rate_interval])))
The spikes on the picture is the result of load testing I've made.
The load testing report looks like this:
As you can see from the report, mean response time
is 1338
sec.
What I'd like to see in Grafana in peak is amount of time around mean response time
(1.3 sec), rather than ~ 3000
sec which currently shown in Grafana.
More over, there were 44467
requests done during the load test with mean requests per sec = 148.23
Questions:
avg by(app) (sum by(app) (increase(api_v1_sdk_token_seconds_sum{app="sdk-backend"}[$__rate_interval])))/avg by(app) (sum by(app) (increase(api_v1_sdk_token_seconds_count{app="sdk-backend"}[$__rate_interval])))
api_v1_sdk_token_seconds_count
) basically, stands for for number of requests have been done.Upvotes: 0
Views: 576
Reputation: 13431
Please notice, that avg by(app)
applied after sum by(app)
does nothing.
Additionally, your initial query doesn't take into consideration possible different number of requests.
Is it correct formulae for displaying mean response time over time?
avg by(app) (sum by(app) (increase(api_v1_sdk_token_seconds_sum{app="sdk-backend"}[$__rate_interval])))/avg by(app) (sum by(app) (increase(api_v1_sdk_token_seconds_count{app="sdk-backend"}[$__rate_interval])))
It is not ideal (you should remove useless avg by
), but should return correct result.
sum by(app) (increase(api_v1_sdk_token_seconds_sum{app="sdk-backend"}[$__rate_interval]))
/ sum by(app) (increase(api_v1_sdk_token_seconds_count{app="sdk-backend"}[$__rate_interval]))
How to write formulae for displaying requests per second
This can be accomplished with simple use of rate
function:
rate(api_v1_sdk_token_seconds_count [$__rate_interval])
Upvotes: 1