Reputation: 31
We have a little GKE cloud with 3 nodes (2 nodes n1-s1 and other one n1-s2), lets call them (A, B and C), running versión "v1.14.10-gke.27" Yesterday after a performance problem with a MySQL POD, we started to dig the reason of the problem, and discovered a high load average in the Virtual Machine node (A) and (B) ... (C) was created after in order to move the DB pod inside.
Well, in our checks (kubectl top nodes) and (kubectl -n MYNAMESPACE top pods), saw that the CPU/memory used in the nodes was medium about 60% CPU and 70% of memory.
Ok, so we did this test. We drain the node A and restarted the virtual machine. By Doing:
kubectl drain --ignore-daemonsets
gcloud compute ssh A
sudo reboot
After rebooting the virtual machine node (A), and wait about 15 minutes, we connect again, and saw this:
gcloud compute ssh A
top
show a load average about 1.0 (0.9 - 1.2) ... but this machines (1 core and 3.5GB RAM) has no POD inside. I checked the machine about 30 minutes, and the core linux system for GKE was always about load average near 1.0
Why ?
Then I did another check. In the node (B), there was only a SFTP server (CPU ussage about 3 millis). I did the same test:
gcloud compute ssh B
top
And this is what showed:
top - 19:02:48 up 45 days, 4:40, 1 user, load average: 1.00, 1.04, 1.09
Tasks: 130 total, 1 running, 129 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.4 us, 1.3 sy, 0.0 ni, 95.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3697.9 total, 1383.6 free, 626.3 used, 1688.1 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 2840.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1065 root 20 0 924936 117608 66164 S 1.7 3.1 1356:05 kubelet
1932 root 20 0 768776 82748 11676 S 1.0 2.2 382:32.65 ruby
1008 root 20 0 806080 90408 26644 S 0.7 2.4 818:40.25 dockerd
183 root 20 0 0 0 0 S 0.3 0.0 0:26.09 jbd2/sda1-8
1 root 20 0 164932 7212 4904 S 0.0 0.2 17:47.38 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.09 kthreadd
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
But:
kubectl -n MYNAMESPACE top pods | grep sftp
sftp-7d7f58cd96-fw6tm 1m 11Mi
CPU ussage only 1m, and RAM 11MB
Why is so high load average ?
I'm worried about this, so this load average could pains the performance of the pods in the cluster nodes.
By other side, I mounted a testing self kubernetes cluster at office with Debian VM nodes, and a the node (2 cores 4 GB RAM), but running PODs for Zammad and Jira, show this load average: OFFICE KUBERNETES CLOUD
ssh user@node02
top
top - 21:11:29 up 17 days, 6:04, 1 user, load average: 0,21, 0,37, 0,21
Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2,4 us, 1,0 sy, 0,0 ni, 96,3 id, 0,3 wa, 0,0 hi, 0,0 si, 0,0 st
MiB Mem : 3946,8 total, 213,4 free, 3249,4 used, 483,9 buff/cache
MiB Swap: 0,0 total, 0,0 free, 0,0 used. 418,9 avail Mem
At offices's node the load average, running pods is about 0.21-0.4 .... This is more realistic and similar to what it's spected to be.
Another problem is that when I connected by ssh to GKE node (A, B or C), there is no tools for monitoring the hard driver / storage like iostat and similars, so I don't know why base KDE nodes are with so high load average, with no pod scheduled.
Today, at critical hour, this is the GKE cloud status:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-n1-s1-A 241m 25% 1149Mi 43%
gke-n1-s1-B 81m 8% 1261Mi 47%
gke-n1-s2-C 411m 21% 1609Mi 28%
but a top in node B, shows
top - 11:20:46 up 45 days, 20:58, 1 user, load average: 1.66, 1.25, 1.13
Tasks: 128 total, 1 running, 127 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.0 us, 2.3 sy, 0.0 ni, 91.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3697.9 total, 1367.8 free, 629.6 used, 1700.6 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 2837.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1065 root 20 0 924936 117608 66164 S 3.3 3.1 1376:27 kubelet
1008 root 20 0 806080 90228 26644 S 1.3 2.4 829:21.65 dockerd
2590758 root 20 0 136340 29056 20908 S 0.7 0.8 18:38.56 kube-dns
443 root 20 0 36200 19736 5808 S 0.3 0.5 3:51.49 google_accounts
1932 root 20 0 764164 82748 11676 S 0.3 2.2 387:52.03 ruby
1 root 20 0 164932 7212 4904 S 0.0 0.2 18:03.44 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.09 kthreadd
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
7 root 20 0 0 0 0 S 0.0 0.0 14:55.03 ksoftirqd/0
EDIT 1: FINALLY LAST TEST:
1.- Create a pool with 1 node
gcloud container node-pools create testpool --cluster MYCLUSTER --num-nodes=1 --machine-type=n1-standard-1
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION
testpool n1-standard-1 100 1.14.10-gke.36
2.- Drain the node and check node status
kubectl drain --ignore-daemonsets gke-MYCLUSTER-testpool-a84f3036-16lr
kubectl get nodes
gke-MYCLUSTER-testpool-a84f3036-16lr Ready,SchedulingDisabled <none> 2m3s v1.14.10-gke.36
3.- Restart machine, wait and top
gcloud compute ssh gke-MYCLUSTER-testpool-a84f3036-16lr
sudo reboot
gcloud compute ssh gke-MYCLUSTER-testpool-a84f3036-16lr
top
top - 11:46:34 up 3 min, 1 user, load average: 1.24, 0.98, 0.44
Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.1 us, 1.0 sy, 0.0 ni, 95.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3697.9 total, 2071.3 free, 492.8 used, 1133.9 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 2964.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1066 root 20 0 895804 99900 65136 S 2.1 2.6 0:04.28 kubelet
1786 root 20 0 417288 74176 11660 S 2.1 2.0 0:03.13 ruby
1009 root 20 0 812868 97168 26456 S 1.0 2.6 0:09.17 dockerd
1 root 20 0 99184 6960 4920 S 0.0 0.2 0:02.25 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H
5 root 20 0 0 0 0 I 0.0 0.0 0:00.43 kworker/u2:0
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
7 root 20 0 0 0 0 S 0.0 0.0 0:00.08 ksoftirqd/0
8 root 20 0 0 0 0 I 0.0 0.0 0:00.20 rcu_sched
9 root 20 0 0 0 0 I 0.0 0.0 0:00.00 rcu_bh
10 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
11 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs
14 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns
15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khungtaskd
16 root 20 0 0 0 0 S 0.0 0.0 0:00.00 oom_reaper
17 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 writeback
1.24 of load average without pod cutom pods ?
EDIT 2 Thanks @willrof. I tryed by using "toolbox", and run "atop", and "iotop" commands. I see nothing anormal but the load average is about (1 - 1.2). As you can see the CPU is doing "nothing" and the IO operations are near zero. Here are the results:
iotop
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % systemd noresume noswap cros_efi
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
2591747 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 0.00 % monitor --source=kube-proxy:http://local~ng.googleapis.com/ --export-interval=120s
4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H]
3399685 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % sudo systemd-nspawn --directory=/var/lib~/resolv.conf:/etc/resolv.conf --user=root
6 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [mm_percpu_wq]
7 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]
8 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_sched]
9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_bh]
10 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]
11 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdog/0]
12 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [cpuhp/0]
13 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kdevtmpfs]
14 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [netns]
15 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [khungtaskd]
16 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [oom_reaper]
17 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [writeback]
18 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kcompactd0]
19 be/7 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [khugepaged]
20 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [crypto]
21 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kintegrityd]
22 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kblockd]
23 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ata_sff]
24 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [watchdogd]
2590745 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % containerd-shim -namespace moby -workdir~runtime-root /var/run/docker/runtime-runc
atop
PRC | sys 14h12m | user 41h11m | #proc 140 | #trun 1 | #tslpi 544 | #tslpu 1 | #zombie 0 | clones 118e5 | #exit 0 |
CPU | sys 2% | user 5% | irq 0% | idle 93% | wait 0% | steal 0% | guest 0% | curf 2.30GHz | curscal ?% |
CPL | avg1 1.17 | avg5 1.17 | avg15 1.17 | | csw 669768e4 | | intr 26835e5 | | numcpu 1 |
MEM | tot 3.6G | free 221.1M | cache 2.1G | buff 285.2M | slab 313.3M | shmem 2.2M | vmbal 0.0M | hptot 0.0M | hpuse 0.0M |
SWP | tot 0.0M | free 0.0M | | | | | | vmcom 6.4G | vmlim 1.8G |
PAG | scan 54250 | steal 37777 | stall 0 | | | | | swin 0 | swout 0 |
LVM | dm-0 | busy 0% | read 6747 | write 0 | KiB/r 36 | KiB/w 0 | MBr/s 0.0 | MBw/s 0.0 | avio 2.00 ms |
DSK | sda | busy 0% | read 19322 | write 5095e3 | KiB/r 37 | KiB/w 8 | MBr/s 0.0 | MBw/s 0.0 | avio 0.75 ms |
DSK | sdc | busy 0% | read 225 | write 325 | KiB/r 24 | KiB/w 13315 | MBr/s 0.0 | MBw/s 0.0 | avio 1.75 ms |
DSK | sdb | busy 0% | read 206 | write 514 | KiB/r 26 | KiB/w 10 | MBr/s 0.0 | MBw/s 0.0 | avio 0.93 ms |
NET | transport | tcpi 69466e3 | tcpo 68262e3 | udpi 135509 | udpo 135593 | tcpao 4116e3 | tcppo 2797e3 | tcprs 738077 | udpie 0 |
NET | network | ipi 222967e3 | ipo 216603e3 | ipfrw 1533e5 | deliv 6968e4 | | | icmpi 74445 | icmpo 6254 |
NET | vethf6a 0% | pcki 40168e3 | pcko 39391e3 | sp 10 Gbps | si 15 Kbps | so 43 Kbps | erri 0 | erro 0 | drpo 0 |
NET | veth046 0% | pcki 8800433 | pcko 9133058 | sp 10 Gbps | si 2 Kbps | so 4 Kbps | erri 0 | erro 0 | drpo 0 |
NET | vethe89 0% | pcki 10923 | pcko 23560 | sp 10 Gbps | si 0 Kbps | so 0 Kbps | erri 0 | erro 0 | drpo 0 |
NET | veth647 0% | pcki 2583709 | pcko 2845889 | sp 10 Gbps | si 0 Kbps | so 0 Kbps | erri 0 | erro 0 | drpo 0 |
NET | veth6be 0% | pcki 374054 | pcko 448480 | sp 10 Gbps | si 0 Kbps | so 0 Kbps | erri 0 | erro 0 | drpo 0 |
NET | eth0 ---- | pcki 12094e4 | pcko 11533e4 | sp 0 Mbps | si 103 Kbps | so 56 Kbps | erri 0 | erro 0 | drpo 0 |
NET | cbr0 ---- | pcki 98061e3 | pcko 92356e3 | sp 0 Mbps | si 36 Kbps | so 71 Kbps | erri 0 | erro 0 | drpo 0 |
NET | lo ---- | pcki 9076898 | pcko 9076898 | sp 0 Mbps | si 5 Kbps | so 5 Kbps | erri 0 | erro 0 | drpo 0 |
*** system and process activity since boot ***
Anyone could help me ?
What can I do ?
Is this behaviour normal in GKE nodes without pods ?
Should I change to another Kubernetes provider ?
Thanks in advance.
Upvotes: 1
Views: 1743
Reputation: 31
After crossing messages with google support, this seems a problem with the stable release version of google VM.
The last official stable version is v1.14.10-gke.36.
We have checked bad load performance since v.1.14.10-gke.27 (we don't go earlier).
We are waiting a response from Google product engineers about this. We checked out the last version available today "1.16.9-gke.2", and the load average is normal in iddle, about 0.15 and lower, but this is no a "stable" release.
If you create a cluster with gcloud command, it gives you the last "stable" and this is "v1.14.10-gke.36" today, so everybody using "v1.14.10-gke.X" should have this problem.
The solution is ...
a) Wait the official response from Google product engineers.
b) Move / update to other version of cluster / nodes (perhaps not stable).
EDIT. 2020/06/24. Google's Response
1- I have informed your feedback to our GKE product engineering team and they are able to reproduce the issue in gke version 1.14.10-gke-36 in cos and cos_containerd but the average load ubuntu and ubuntu_containerd have lower average load. So, our GKE product engineer suggested the quick workaround to upgrade the cluster and node pool to 1.15. For the permanent fix our GKE product team is working but I do not have any ETA to share as of now.
2- How to upgrade the cluster: for best practice I found a document[1], in this way you can upgrade your cluster with zero downtime. Please note while the master (cluster) is upgrade the workload is not impacted but we will not be able to reach the api server. So we can not deploy new workload or make any change or monitor the status during the upgrade. But we can make the cluster to regional cluster which has multiple master node. Also this document suggested two way to upgrade nodepool Rolling update and Migration with node pools. Now to address the PV and PVC, I have tested in my project and found during the nodepool upgrade the PVC is not deleted so as the PV is not deleted (though the reclaim policy defined as Delete). But I would suggest to take the backup of your disk (associated with PV) and recreate the PV with following the document[2].
3- lastly why the 1.14.10-gke.36 is default version? The default version is set and updated gradually and as of document[3] the last time it was set to 1.14.10-gke-36 on May 13, and this can be change in any next update. But we can define the gke cluster version manually.
Please let me know if you have query or feel like I have missed something here. And for 1.14.10-gke-36 issue update you can expect an update from me on Friday (June 26, 2020) 16:00 EDT time.
[2]- https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/preexisting-pd
[3]- https://cloud.google.com/kubernetes-engine/docs/release-notes#new_default_version
Upvotes: 1