alt-f4
alt-f4

Reputation: 2326

How to trigger auto-scaling for EKS pods?

Context

I am running an application (Apache Airflow) on EKS, that spins up new workers to fulfill new tasks. Every worker is required to spin up a new pod. I am afraid to run out of memory and/or CPU when there are several workers being spawned. My objective is to trigger auto-scaling.

What I have tried

I am using Terraform for provisioning (also happy to have answers that are not in Terraform, which i can conceptually transform to Terraform code).

I have setup a fargate profile like:

#  Create EKS Fargate profile
resource "aws_eks_fargate_profile" "airflow" {
  cluster_name           = module.eks_cluster.cluster_id
  fargate_profile_name   = "${var.project_name}-fargate-${var.env_name}"
  pod_execution_role_arn = aws_iam_role.fargate_iam_role.arn
  subnet_ids             = var.private_subnet_ids

  selector {
    namespace = "fargate"
  }

  tags = {
    Terraform   = "true"
    Project     = var.project_name
    Environment = var.env_name
  }
}

My policy for auto scaling the nodes:

# Create IAM Policy for node autoscaling
resource "aws_iam_policy" "node_autoscaling_pol" {
  name   = "${var.project_name}-node-autoscaling-${var.env_name}"
  policy = data.aws_iam_policy_document.node_autoscaling_pol_doc.json
}

# Create autoscaling policy
data "aws_iam_policy_document" "node_autoscaling_pol_doc" {
  statement {
    actions   = [
      "autoscaling:DescribeAutoScalingGroups",
      "autoscaling:DescribeAutoScalingInstances",
      "autoscaling:DescribeLaunchConfigurations",
      "autoscaling:DescribeTags",
      "autoscaling:SetDesiredCapacity",
      "autoscaling:TerminateInstanceInAutoScalingGroup",
      "ec2:DescribeLaunchTemplateVersions"
    ]
    effect    = "Allow"
    resources = ["*"]
  }
}

And finally a (just a snippet for brevity):

# Create EKS Cluster
module "eks_cluster" {
  cluster_name  = "${var.project_name}-${var.env_name}"
  # Assigning worker groups
  worker_groups = [
    {
      instance_type = var.nodes_instance_type_1
      asg_max_size  = 1
      name          = "${var.project_name}-${var.env_name}"
    }
  ]
}

Question

Is increasing the asg_max_size sufficient for auto scaling? I have a feeling that I need to set something where along the lines of: "When memory exceeds X do y" but I am not sure.

I don't have so much experience with advanced monitoring/metrics tools, so a somewhat simple solution that does basic auto-scaling would be the best fit for my needs = )

Upvotes: 0

Views: 1074

Answers (1)

coderanger
coderanger

Reputation: 54267

This is handled by a tool called cluster-autoscaler. You can find the EKS guide for it at https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html or the project itself at https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

Upvotes: 2

Related Questions