lovecraft66
lovecraft66

Reputation: 11

Prometheus/Graphana Alerting on pod stuck in pending state

I'm new to running Prometheus and Graphana. I want to create an alert that fires when a Kubernetes pod is in a pending state for more than 15 minutes. The PromQL query I'm using is:

kube_pod_status_phase{exported_namespace="mynamespace", phase="Pending"} > 0

What I haven't been able to figure out is how to construct an alert based upon how long the pod has been in that state. I've tried a few permutations of alert conditions in Graphana along the lines of:

WHEN avg() OF query (A, 15m, now) IS ABOVE 1

The all fire an alert based upon the number of pods in the state rather than the duration.

How can an alert be constructed based upon the time in the state?

Please & Thank You

Upvotes: 1

Views: 4977

Answers (2)

Rotem jackoby
Rotem jackoby

Reputation: 22208

I agree with @avis comment that this might be more satble alert:

      - name: Kubernetes Pod not healthy
        summary: Kubernetes Pod not healthy ({{ $labels.namespace }}/{{ $labels.pod }})
        description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-running state for longer than 15 minutes.
        query: 'sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"}) > 0'
        severity: critical
        for: 15m

Upvotes: 1

dansl1982
dansl1982

Reputation: 1128

- alert: KubernetesPodNotHealthy
expr: min_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0
for: 0m
labels:
  severity: critical
annotations:
  summary: Kubernetes Pod not healthy (instance {{ $labels.instance }})
  description: "Pod has been in a non-ready state for longer than 15 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Upvotes: 1

Related Questions