Reputation: 2569
I want to find the sum number of alerts for all the pods starting with "sendsms" over 10minutes.
I am able to do use label_replace() to do this on the instant vector. But when i want to do this for over 10 minutes data, it cannot work as label_replace only works on instant vector.
Explaining the problem with an example:
ALERTS{alertname="CPUThrottlingHigh",pod="sendsms-dbed"} 10
ALERTS{alertname="CPUThrottlingHigh",pod="sendsms-ebed"} 20
ALERTS{alertname="CPUThrottlingHigh",pod="sendsms-fbed"} 30
ALERTS{alertname="CPUThrottlingHigh",pod="sendmail-gbed"} 60
ALERTS{alertname="CPUThrottlingHigh",pod="sendmail-hbed"} 70
ALERTS{alertname="CPUThrottlingHigh",pod="sendmail-ibed"} 80
Using label replace i can add a new label using the REGEX and then i can group it and get the results.
label_replace(ALERTS{alertname="CPUThrottlingHigh", "podname", "$1", "pod", "([a-z-A-Z]+)-.*")
ALERTS{alertname="CPUThrottlingHigh",pod="sendsms-dbed", podname="sendsms"} 10
ALERTS{alertname="CPUThrottlingHigh",pod="sendsms-dbed", podname="sendsms"} 10
How to do this for ALERTS in 10 minutes and calculate the sum?
I want some result like this for last 10 minutes
ALERTS{alertname="CPUThrottlingHigh",podname="sendsms"} 60
ALERTS{alertname="CPUThrottlingHigh",podname="sendmail"} 210
Objective: Find the pods which are creating maximum no of alerts in last 1 week.
Upvotes: 6
Views: 6656
Reputation: 2569
I was able to solve this problem by doing label_replace after doing the sum
Query
sort_desc(
sum by (pod_set) (
label_replace(
sort_desc(
sum by (namespace, pod) (
avg_over_time(
ALERTS{
alertname=~"(KubeDeploymentReplicasMismatch|KubePodNotReady|KubePodCrashLooping|KubeJobFailed)",
alertstate="firing"
}[1w]
)
)
), "pod_set", "$1", "pod", "([a-z-A-Z]+)-.*"
)
)
)
Result
{pod_set="sendsms"} 62
{pod_set="emailspreprocessor"} 32
{pod_set="sendmail"} 21
Upvotes: 9