Reputation: 2326
I am running an application (Apache Airflow) on EKS, that spins up new workers to fulfill new tasks. Every worker is required to spin up a new pod. I am afraid to run out of memory and/or CPU when there are several workers being spawned. My objective is to trigger auto-scaling.
I am using Terraform for provisioning (also happy to have answers that are not in Terraform, which i can conceptually transform to Terraform code).
I have setup a fargate profile like:
# Create EKS Fargate profile
resource "aws_eks_fargate_profile" "airflow" {
cluster_name = module.eks_cluster.cluster_id
fargate_profile_name = "${var.project_name}-fargate-${var.env_name}"
pod_execution_role_arn = aws_iam_role.fargate_iam_role.arn
subnet_ids = var.private_subnet_ids
selector {
namespace = "fargate"
}
tags = {
Terraform = "true"
Project = var.project_name
Environment = var.env_name
}
}
My policy for auto scaling the nodes:
# Create IAM Policy for node autoscaling
resource "aws_iam_policy" "node_autoscaling_pol" {
name = "${var.project_name}-node-autoscaling-${var.env_name}"
policy = data.aws_iam_policy_document.node_autoscaling_pol_doc.json
}
# Create autoscaling policy
data "aws_iam_policy_document" "node_autoscaling_pol_doc" {
statement {
actions = [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions"
]
effect = "Allow"
resources = ["*"]
}
}
And finally a (just a snippet for brevity):
# Create EKS Cluster
module "eks_cluster" {
cluster_name = "${var.project_name}-${var.env_name}"
# Assigning worker groups
worker_groups = [
{
instance_type = var.nodes_instance_type_1
asg_max_size = 1
name = "${var.project_name}-${var.env_name}"
}
]
}
Is increasing the asg_max_size
sufficient for auto scaling? I have a feeling that I need to set something where along the lines of: "When memory exceeds X do y" but I am not sure.
I don't have so much experience with advanced monitoring/metrics tools, so a somewhat simple solution that does basic auto-scaling would be the best fit for my needs = )
Upvotes: 0
Views: 1074
Reputation: 54267
This is handled by a tool called cluster-autoscaler. You can find the EKS guide for it at https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html or the project itself at https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
Upvotes: 2