Reputation: 3645
I'm trying to set up an autoscaling Fargate cluster for GitHub self-hosted runners. The high-level design for this looks like this –
COUNT
metric with value 1
if the request is for a new workflow, and a -1
for a completed or cancelled workflow. The metric will include the repo owner (REPO_OWNER
), the repo name (REPO_NAME
), event type (EVENT_TYPE
, which I know will always be workflow_job
) and the workflow run ID (ID
) as dimensions.ecs:service:DesiredCount
dimension based on the value of the custom metric.const autoscalingTarget = new AppautoscalingTarget(this, `appautoscaling-target-${environment}`, {
serviceNamespace: 'ecs',
resourceId: `service/${ecsCluster.awsEcsClusterClusterNameOutput}/${ecsService.awsEcsServiceServiceNameOutput}`,
scalableDimension: 'ecs:service:DesiredCount',
minCapacity: 0,
maxCapacity: options.maxClusterSize,
})
const scaleUpPolicy = new AppautoscalingPolicy(this, `autoscale-up-policy-${environment}`, {
dependsOn: [autoscalingTarget],
name: `autoscale-up-policy-${environment}`,
serviceNamespace: 'ecs',
resourceId: `service/${ecsCluster.awsEcsClusterClusterNameOutput}/${ecsService.awsEcsServiceServiceNameOutput}`,
scalableDimension: 'ecs:service:DesiredCount',
stepScalingPolicyConfiguration: {
adjustmentType: 'ChangeInCapacity',
cooldown: 30,
metricAggregationType: 'Maximum',
stepAdjustment: [{
metricIntervalLowerBound: '1',
scalingAdjustment: 1,
}]
},
})
const scaleDownPolicy = new AppautoscalingPolicy(this, `autoscale-down-policy-${environment}`, {
dependsOn: [autoscalingTarget],
name: `autoscale-down-policy-${environment}`,
serviceNamespace: 'ecs',
resourceId: `service/${ecsCluster.awsEcsClusterClusterNameOutput}/${ecsService.awsEcsServiceServiceNameOutput}`,
scalableDimension: 'ecs:service:DesiredCount',
stepScalingPolicyConfiguration: {
adjustmentType: 'ChangeInCapacity',
cooldown: 30,
metricAggregationType: 'Maximum',
stepAdjustment: [{
metricIntervalUpperBound: '0',
scalingAdjustment: -1,
}]
}
})
const alarmPeriod = 120 as const
new CloudwatchMetricAlarm(this, `autoscale-up-alarm-${environment}`, {
alarmName: `fargate-cluster-scale-up-alarm-${environment}`,
metricName: options.customCloudWatchMetricName,
namespace: options.customCloudWatchMetricNamespace,
alarmDescription: `Scales up the Fargate cluster based on the ${options.customCloudWatchMetricNamespace}.${options.customCloudWatchMetricName} metric`,
comparisonOperator: 'GreaterThanThreshold',
threshold: 0,
evaluationPeriods: 1,
metricQuery: [{
id: 'm1',
metric: {
metricName: options.customCloudWatchMetricName,
namespace: options.customCloudWatchMetricNamespace,
period: alarmPeriod,
stat: 'Sum',
unit: 'Count',
dimensions:
{
// Note: this is the only dimension I can know in advance
EVENT_TYPE: 'workflow_job',
},
},
}, {
id: 'm2',
metric: {
metricName: options.customCloudWatchMetricName,
namespace: options.customCloudWatchMetricNamespace,
period: alarmPeriod,
stat: 'Sum',
unit: 'Count',
dimensions:
{
// Note: this is the only dimension I can know in advance
EVENT_TYPE: 'workflow_job',
},
},
}, {
id: 'e1',
expression: 'SUM(METRICS())',
label: 'Sum of Actions Runner Requests',
returnData: true,
}],
alarmActions: [
scaleUpPolicy.arn,
],
actionsEnabled: true,
})
new CloudwatchMetricAlarm(this, `autoscale-down-alarm-${environment}`, {
alarmName: `fargate-cluster-scale-down-alarm-${environment}`,
alarmDescription: `Scales down the Fargate cluster based on the ${options.customCloudWatchMetricNamespace}.${options.customCloudWatchMetricName} metric`,
comparisonOperator: 'LessThanThreshold',
threshold: 1,
period: alarmPeriod,
evaluationPeriods: 1,
metricQuery: [{
id: 'm1',
metric: {
metricName: options.customCloudWatchMetricName,
namespace: options.customCloudWatchMetricNamespace,
period: alarmPeriod,
stat: 'Sum',
unit: 'Count',
dimensions: {
// Note: this is the only dimension I can know in advance
EVENT_TYPE: 'workflow_job',
}
},
}, {
id: 'm2',
metric: {
metricName: options.customCloudWatchMetricName,
namespace: options.customCloudWatchMetricNamespace,
period: alarmPeriod,
stat: 'Sum',
unit: 'Count',
dimensions: {
// Note: this is the only dimension I can know in advance
EVENT_TYPE: 'workflow_job',
}
},
}, {
id: 'e1',
expression: 'SUM(METRICS())',
label: 'Sum of Actions Runner Requests',
returnData: true,
}],
alarmActions: [
scaleDownPolicy.arn,
],
actionsEnabled: true,
})
I do not see the metrics showing data nor the alarm changing states until I add all the 4 dimensions. Adding only 1 dimension (EVENT_TYPE
, which is the only static dimension) gives me no data, but adding all 4 does.
How do I model my metrics so I can continue adding more dynamic metadata as dimensions but still set up working alarms based on well-known static dimensions?
Upvotes: 0
Views: 637
Reputation: 3645
I was able to solve this by removing all dimensions on the Cloudwatch metrics.
Upvotes: 0