user2399020
user2399020

Reputation: 137

Error when trying to save GCP Alert Policy

I'm struggling with a GCP MQL alert policy that I built up in the GUI. When I try to save it I keep getting an error message:

"Error: Unable to save alerting policy. Request contains an invalid argument."

The query appears valid, in the sense that there are no issues reported in the query editor and I can 'Run' the query to display the output without problem.

This is the json view, which is generated by the policy creator:

{
  "displayName": "kube_cronjob_job_failed",
  "userLabels": {},
  "conditions": [
    {
      "displayName": "kube_cronjob_job_failed",
      "conditionMonitoringQueryLanguage": {
        "duration": "0s",
        "trigger": {
          "count": 1
        },
        "query": "fetch kubernetes.io/anthos/kube_job_status_failed | add[job_name:  re_extract(metric.job_name,'(^\\\\D*)([0-9]*)','\\\\1'), job_start_time: string_to_int64(re_extract(metric.job_name,'(^\\\\D*)([0-9]*)','\\\\2'))] | top_by [job_name], 1, job_start_time | group_by 1m, max(val()) | condition val() > 0"
      }
    }
  ],
  "alertStrategy": {
    "autoClose": "604800s"
  },
  "combiner": "OR",
  "enabled": true,
  "notificationChannels": [
    "projects/xxxxxxxxxx/notificationChannels/xxxxxxxxxxx"
  ]
}

And the query, just to show it more clearly:

fetch kubernetes.io/anthos/kube_job_status_failed
| add
    [job_name: re_extract(metric.job_name, '(^\\D*)([0-9]*)', '\\1'),
     job_start_time:
       string_to_int64(re_extract(metric.job_name, '(^\\D*)([0-9]*)', '\\2'))]
| top_by [job_name], 1, job_start_time
| group_by 1m, max(val())
| condition val() > 0

The query is trying to determine the status of the most recent job created by a kubernetes cronjob.

Upvotes: 0

Views: 308

Answers (2)

user2399020
user2399020

Reputation: 137

So I managed to find a solution to this. The issue seemed to be with adding the additional columns. Adding a drop operation and moving the group_by operation to before the top_by did the job.

fetch kubernetes.io/anthos/kube_job_status_failed
| add
    [job_name: re_extract(metric.job_name, '(.+)-(\\d{8})', r'\1'),
     job_start_time:
       string_to_int64(re_extract(metric.job_name, '(.+)-(\\d{8})', r'\2'))]
| group_by 1m, max(val())
| top_by [job_name], 1, job_start_time
| drop [job_name, job_start_time]
| condition val() > 0

Upvotes: 0

Dion V
Dion V

Reputation: 824

As per Sai Chandra Gadde, there are some MQL table operations that require their inputs to be aligned and if they pass unaligned inputs, MQL will align it. And it causes some problems in alerting query.

They tried adding

| window 30s

after the operation that implicitly aligns the data for you.

You may refer to the sample query provided by Sai Chandra Gadde

fetch istio_canonical_service
| metric 'istio.io/service/server/request_count'
| { filter (metric.response_code < 499); ident }
| group_by [metric.destination_service_namespace]
| ratio
| fraction_less_than(0.50)
| condition val() > 0.20
| window 30s # correctly sets the window to 30s

As reference, you can check the previous post or refer to the documentation.

Upvotes: 0

Related Questions