Reputation: 21
Hi i have been trying to do cpu pinning in my eks cluster. i have used amazon linux latest release, and my eks version is 1.22 . i have created a launch template where i have used this user data mentioned below.
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
--//
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh $CLUSTER_NAME
sleep 2m
yum update -y
sudo rm /var/lib/kubelet/cpu_manager_state
sudo chmod 777 kubelet.service
sudo cat > /etc/systemd/system/kubelet.service <<EOF
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=docker.service iptables-restore.service
Requires=docker.service
[Service]
ExecStartPre=/sbin/iptables -P FORWARD ACCEPT -w 5
ExecStart=/usr/bin/kubelet --cloud-provider aws \
--image-credential-provider-config /etc/eks/ecr-credential-provider/ecr-
credential-provider-config \
--image-credential-provider-bin-dir /etc/eks/ecr-credential-provider \
--cpu-manager-policy=static \
--kube-reserved=cpu=0.5,memory=1Gi,ephemeral-storage=0.5Gi \
--system-reserved=cpu=0.5,memory=1Gi,ephemeral-storage=0.5Gi \
--config /etc/kubernetes/kubelet/kubelet-config.json \
--kubeconfig /var/lib/kubelet/kubeconfig \
--container-runtime docker \
--network-plugin cni $KUBELET_ARGS $KUBELET_EXTRA_ARGS
Restart=always
RestartSec=5
KillMode=process
[Install]
WantedBy=multi-user.target
EOF
sudo chmod 644 kubelet.service
sudo systemctl daemon-reload
sudo systemctl stop kubelet
sudo systemctl start kubelet
--//
after creating the template i have used it on the eks nodegroup creation. after waititng a while i am getting this error on the eks dashboard.
Health issues (1) NodeCreationFailure Instances failed to join the kubernetes cluster .
and i have get into that ec2 instance and used the following command to view kubectl logs
$journalctl -f -u kubelet
the output is
[[email protected] kubelet]$ journalctl -f -u kubelet
-- Logs begin at Thu 2022-04-21 07:27:50 UTC. --
Apr 21 07:31:21 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: I0421
07:31:21.199868 12225 state_mem.go:80] "Updated desired CPUSet" podUID="3b513cfa-
441d-4e25-9441-093b4c2ed548" containerName="efs-plugin" cpuSet="0-7"
Apr 21 07:31:21 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: I0421
07:31:21.244811 12225 state_mem.go:80] "Updated desired CPUSet" podUID="3b513cfa-
441d-4e25-9441-093b4c2ed548" containerName="csi-provisioner" cpuSet="0-7"
Apr 21 07:31:21 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: I0421
07:31:21.305206 12225 state_mem.go:80] "Updated desired CPUSet" podUID="3b513cfa-
441d-4e25-9441-093b4c2ed548" containerName="liveness-probe" cpuSet="0-7"
Apr 21 07:31:21 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: I0421
07:31:21.335744 12225 state_mem.go:80] "Updated desired CPUSet" podUID="de537700-
f5ac-4039-a151-110ddf27d140" containerName="efs-plugin" cpuSet="0-7"
Apr 21 07:31:21 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: I0421
07:31:21.388843 12225 state_mem.go:80] "Updated desired CPUSet" podUID="de537700-
f5ac-4039-a151-110ddf27d140" containerName="csi-driver-registrar" cpuSet="0-7"
Apr 21 07:31:21 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: I0421
07:31:21.464789 12225 state_mem.go:80] "Updated desired CPUSet" podUID="de537700-
f5ac-4039-a151-110ddf27d140" containerName="liveness-probe" cpuSet="0-7"
Apr 21 07:31:21 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: I0421
07:31:21.545206 12225 state_mem.go:80] "Updated desired CPUSet" podUID="a2f09d0d-
69f5-4bb7-82bb-edfa86cb87e2" containerName="kube-controller" cpuSet="0-7"
Apr 21 07:31:21 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: I0421
07:31:21.633078 12225 state_mem.go:80] "Updated desired CPUSet" podUID="3ec70fe1-
3680-4e3c-bcfa-81f80ebe20b0" containerName="kube-proxy" cpuSet="0-7"
Apr 21 07:31:21 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: I0421
07:31:21.696852 12225 state_mem.go:80] "Updated desired CPUSet" podUID="adbd9bef-
c4e0-4bd1-a6a6-52530ad4bea3" containerName="aws-node" cpuSet="0-7"
Apr 21 07:46:12 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: E0421
07:46:12.424801 12225 certificate_manager.go:488] kubernetes.io/kubelet-serving:
certificate request was not signed: timed out waiting for the condition
Apr 21 08:01:16 ip-10.100.11.111.us-west-2.compute.internal kubelet[12225]: E0421
08:01:16.810385 12225 certificate_manager.go:488] kubernetes.io/kubelet-serving:
certificate request was not signed: timed out waiting for the condition
this was the output..
But before using this method i have also tried another method, where i have created a node group and then i have created an ami from one of the nodes in that nodegroup.. then modified the kubelet.service file and removed the old cpu_manager_state file.. then the i have used this image to create the nodegroup. Then it worked fine But the problem was i am unable to get into the pods running in those nodes and also i am unable to get the logs of the pods running there. and strangely if i use $kubectl get nodes -o wide in the output i was not getting the internal and external both ip addresses. so i moved on to using the userdata instead of this method.
kindly give me instructions to create a managed nodegroup with cpu_manager_state as static policy for eks cluster .
Upvotes: 2
Views: 1847
Reputation: 321
CPU manager policy is only supported in EKS since K8s version 1.23
. As you mentioned you're using EKS 1.22
I suppose, you can't set the CPU manager policy to static
unless you upgrade to at least 1.23
as this kubelet config option is (most probably) not supported on your EKS nodes.
As documented in the K8s Feature Gates table, CPUManagerPolicyOptions
Feature Gate entered Beta
in 1.23
only and turned stable/GA since K8s 1.26
.
Upvotes: 1
Reputation: 779
I had the same question. I added the following userdata
script to my launch template
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
yum install -y jq
set -o xtrace
cp /etc/kubernetes/kubelet/kubelet-config.json /etc/kubernetes/kubelet/kubelet-config.json.back
jq '. += { "cpuManagerPolicy":"static"}' /etc/kubernetes/kubelet/kubelet-config.json.back > /etc/kubernetes/kubelet/kubelet-config.json
--==MYBOUNDARY==--
You can verify the change took effect using kubectl
:
# start a k8s API proxy
$ kubectl proxy
# get the node name
$ kubectl get nodes
# get kubelet config
$ curl -sSL "http://localhost:8001/api/v1/nodes/<<node_name>>/proxy/configz"
I got the solution from this guide: https://aws.amazon.com/premiumsupport/knowledge-center/eks-worker-nodes-image-cache/. However, I could not make the sed
command properly work so I used jq
instead.
If you can ssh
into the node, you can check the userdata logs in /var/log/cloud-init-output.log
- See https://stackoverflow.com/a/32460849/4400704
I have a pod with a status QoS Guarantee
(CPU limit and requested = 2) and I can verify it has two CPU reserved
$ cat /sys/fs/cgroup/cpuset/cpuset.cpus
2,10
Upvotes: 4