Reputation: 15
I am trying to create an alert Datadog using Terraform for when multiple hosts (1 or more) are at >= 95% CPU usage. So far, with the code I have, the alert would trigger anytime a host exceeds the threshold and that is a little too noisy. Would you happen to know how to create the logic to satisfy both conditions before the alert gets triggered? (Alert when Multiple hosts at 95% CPU or higher)
resource "datadog_monitor" "worker_high_disk_usage" {
type = "metric alert"
name = "worker high disk usage"
message = <<-EOT
{{#is_alert}}
@slack_channel {{system}} {{env}} host {{host.name}} device {{device}} has had disk usage
enter code hereover {{threshold}} of availible disk space for the last 30m
{{/is_alert}}
{{#is_recovery}}
@pagerduty
{{system}} {{env}} host {{host.name}} device {{device}} high disk usage resolved.
{{/is_recovery}}
EOT
query = "min(last_30m):avg:system.disk.in_use{env:prod,system:worker,team:team} by
{host,device} > 0.95"
thresholds = {
critical = 0.95
timeout_h = 1
require_full_window = false
lifecycle {
ignore_changes = [silenced]
}
tags = ["disk"]
}
Upvotes: 0
Views: 1304
Reputation: 51
Not sure if this will work but you can give it a try..:
{{^is_exact_match a.value b.value }}
@[email protected] Alert 2 hosts has passed the threshold
{{/is_exact_match}}
same value - ignore - do nothing
The problem is that you probably might get 2 alerts at the same time...
Upvotes: 0