Reputation: 5
I have ECS cluster configured with target tracking policy on service and capacity provider which managing ASG autoscaling
In my cluster count of minimum and maximum tasks in service and minimum and maximum capacity in ASG the same .
When performed scale in action, tasks decreased to minimum count. But ASG still have 1 or more unused ( task not placed on this EC2 instance ) ec2 instance
How can i configure my cluster with capacity provider to perform scale in to minimum count ASG capacity ?
# CLUSTER
resource "aws_ecs_cluster" "default" {
name = local.name
capacity_providers = [aws_ecs_capacity_provider.asg.name]
tags = local.tags
default_capacity_provider_strategy {
base = 0
capacity_provider = aws_ecs_capacity_provider.asg.name
weight = 1
}
}
# SERVICE
resource "aws_ecs_service" "ecs_service" {
name = "${local.name}-service"
cluster = aws_ecs_cluster.default.id
task_definition = aws_ecs_task_definition.ecs_task.arn
health_check_grace_period_seconds = 60
deployment_maximum_percent = 50
deployment_minimum_healthy_percent = 100
load_balancer {
target_group_arn = element(module.aws-alb-common-module.target_group_arns, 1)
container_name = local.name
container_port = 8080
}
lifecycle {
ignore_changes = [desired_count, task_definition]
}
}
# CAPACITY PROVIDER
resource "aws_ecs_capacity_provider" "asg" {
name = aws_autoscaling_group.ecs_nodes.name
auto_scaling_group_provider {
auto_scaling_group_arn = aws_autoscaling_group.ecs_nodes.arn
managed_termination_protection = "DISABLED"
managed_scaling {
maximum_scaling_step_size = 10
minimum_scaling_step_size = 1
status = "ENABLED"
target_capacity = 100
}
}
}
# SERVICE AUTOSCALING POLICY
resource "aws_appautoscaling_target" "ecs_target" {
max_capacity = 20
min_capacity = 2
resource_id = "service/${local.name}/${aws_ecs_service.ecs_service.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "ecs_policy" {
name = "${local.name}-scale-policy"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.ecs_target.resource_id
scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
service_namespace = aws_appautoscaling_target.ecs_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 2
}
# ASG
resource "aws_autoscaling_group" "ecs_nodes" {
name_prefix = "${local.name}-node"
max_size = 20
min_size = 2
vpc_zone_identifier = local.subnets_ids
protect_from_scale_in = false
mixed_instances_policy {
instances_distribution {
on_demand_percentage_above_base_capacity = local.spot
}
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.node.id
version = "$Latest"
}
dynamic "override" {
for_each = local.instance_types
content {
instance_type = override.key
weighted_capacity = override.value
}
}
}
}
lifecycle {
create_before_destroy = true
}
tag {
key = "AmazonECSManaged"
propagate_at_launch = true
value = ""
}
}
Upvotes: 0
Views: 2617
Reputation: 859
The cause is likely that the predefined_metric_specification
block target_value = 2
is the cpu usage trigger level (percent), not the minimum capacity. The instance is probably being kept alive by background processes using small amounts of CPU.
By the way, the managed_termination_protection setting is probably worth reenabling.
Update in response to comments on 25/09:
Ok, it's entirely possible I'm wrong here (especially as I haven't used this feature myself yet), and if so I'm very happy to learn from it.
But this is how I read the mentioned documentation in relation to your config: The key phrase is The target capacity value is used as the target value for the CloudWatch metric used in the Amazon ECS-managed target tracking scaling policy. The cloudwatch metric you have selected is ECSServiceAverageCPUUtilization, which is discussed at How is ECSServiceAverageCPUUtilization metric caluclated?. So the target=2 you have configured means 2% average CPU utilisation.
I admit I mistakenly assumed the CPU metric was an EC2-instance-level average. But in either case, having your trigger value set to 2% CPU is likely to cause/maintain scaleout when none is needed.
It's also possible you've found the simple explanation for the behaviour you're seeing, i.e. the but this behavior is not guaranteed at all times statement. However I suspect that statement applies more to the extreme example of target 100% where one can expect to see anomalies, just as that can be expected at the similarly extreme 2%.
Upvotes: 0