Joe.Zeppy
Joe.Zeppy

Reputation: 340

Creating an CloudWatch Event Rule for failed ECS tasks

I currently have an ECS task that fails from time to time, with different error codes. I would like to create a CloudWatch event rule that is triggered on such failures. I currently have the following cloud watch event rule, which is triggered only on exit code 1. I would like to be notified of all non-zero errors exit code.

{
  "source": [
    "aws.ecs"
  ],
  "detail-type": [
    "ECS Task State Change"
  ],
  "detail": {
    "lastStatus": [
      "STOPPED"
    ],
    "stoppedReason": [
      "Essential container in task exited"
    ],
    "containers": {
      "exitCode": [
        "1"
      ]
    }
  }
}

Upvotes: 5

Views: 5688

Answers (2)

Ntwobike
Ntwobike

Reputation: 2741

Now you can use "anything-but" in CW rules.

{
  "source": [
    "aws.ecs"
  ],
  "detail-type": [
    "ECS Task State Change"
  ],
  "detail": {
    "lastStatus": [
      "STOPPED"
    ],
    "containers": {
      "exitCode": [
        {
          "anything-but": 0
        }
      ]
    }
  }
}

Upvotes: 13

Stigz
Stigz

Reputation: 53

There isn't a negation function in event patterns. It is exact match only. From here:

It is important to remember the following about event pattern matching:

  • For a pattern to match an event, the event must contain all the field names listed in the pattern. The field names must appear in the event with the same nesting structure.

  • Other fields of the event not mentioned in the pattern are ignored; effectively, there is a "": "" wildcard for fields not mentioned.

  • The matching is exact (character-by-character), without case-folding or any other string normalization.

  • The values being matched follow JSON rules: Strings enclosed in quotes, numbers, and the unquoted keywords true, false, and null.

  • Number matching is at the string representation level. For example, 300, 300.0, and 3.0e2 are not considered equal.

If you have a known amount of exit codes, from here, you can have an array of exit codes in your rule. Something like:

{
  "source": [
    "aws.ecs"
  ],
  "detail-type": [
    "ECS Task State Change"
  ],
  "detail": {
    "lastStatus": [
      "STOPPED"
    ],
    "stoppedReason": [
      "Essential container in task exited"
    ],
    "containers": {
      "exitCode": [1, 2, 3, and so on...]
    }
  }
}

Since there a finite amount of exit codes (0-255), you can enter them all into the array.

Obviously, that's going to look very silly. For a more elegant (and arguably robust) solution, you would need to modify your event rule to trigger on any "STOPPED" event, and create a custom lambda function.

  1. Create lambda function to interrogate the event code using negation exitCode != 0. If true then send notification (SNS, SES, whatever you are using...).
  2. Reconfigure event pattern to trigger on any STOPPED event.
  3. Reconfigure event rule to send to your lambda function created in step 1.

Upvotes: 1

Related Questions