Juan Miguel
Juan Miguel

Reputation: 31

Why GKE node is so high load average at start without pods?

We have a little GKE cloud with 3 nodes (2 nodes n1-s1 and other one n1-s2), lets call them (A, B and C), running versión "v1.14.10-gke.27" Yesterday after a performance problem with a MySQL POD, we started to dig the reason of the problem, and discovered a high load average in the Virtual Machine node (A) and (B) ... (C) was created after in order to move the DB pod inside.

Well, in our checks (kubectl top nodes) and (kubectl -n MYNAMESPACE top pods), saw that the CPU/memory used in the nodes was medium about 60% CPU and 70% of memory.

Ok, so we did this test. We drain the node A and restarted the virtual machine. By Doing:

kubectl drain --ignore-daemonsets
gcloud compute ssh A
sudo reboot

After rebooting the virtual machine node (A), and wait about 15 minutes, we connect again, and saw this:

gcloud compute ssh A
top

show a load average about 1.0 (0.9 - 1.2) ... but this machines (1 core and 3.5GB RAM) has no POD inside. I checked the machine about 30 minutes, and the core linux system for GKE was always about load average near 1.0

Why ?

Then I did another check. In the node (B), there was only a SFTP server (CPU ussage about 3 millis). I did the same test:

gcloud compute ssh B
top

And this is what showed:

top - 19:02:48 up 45 days,  4:40,  1 user,  load average: 1.00, 1.04, 1.09

Tasks: 130 total,   1 running, 129 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.4 us,  1.3 sy,  0.0 ni, 95.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3697.9 total,   1383.6 free,    626.3 used,   1688.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   2840.3 avail Mem
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1065 root      20   0  924936 117608  66164 S   1.7   3.1   1356:05 kubelet
   1932 root      20   0  768776  82748  11676 S   1.0   2.2 382:32.65 ruby
   1008 root      20   0  806080  90408  26644 S   0.7   2.4 818:40.25 dockerd
    183 root      20   0       0      0      0 S   0.3   0.0   0:26.09 jbd2/sda1-8
      1 root      20   0  164932   7212   4904 S   0.0   0.2  17:47.38 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.09 kthreadd
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq

But:

kubectl -n MYNAMESPACE top pods | grep sftp

sftp-7d7f58cd96-fw6tm   1m           11Mi

CPU ussage only 1m, and RAM 11MB

Why is so high load average ?

I'm worried about this, so this load average could pains the performance of the pods in the cluster nodes.

By other side, I mounted a testing self kubernetes cluster at office with Debian VM nodes, and a the node (2 cores 4 GB RAM), but running PODs for Zammad and Jira, show this load average: OFFICE KUBERNETES CLOUD

ssh user@node02
top

top - 21:11:29 up 17 days,  6:04,  1 user,  load average: 0,21, 0,37, 0,21
Tasks: 161 total,   2 running, 159 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2,4 us,  1,0 sy,  0,0 ni, 96,3 id,  0,3 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Mem :   3946,8 total,    213,4 free,   3249,4 used,    483,9 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.    418,9 avail Mem

At offices's node the load average, running pods is about 0.21-0.4 .... This is more realistic and similar to what it's spected to be.

Another problem is that when I connected by ssh to GKE node (A, B or C), there is no tools for monitoring the hard driver / storage like iostat and similars, so I don't know why base KDE nodes are with so high load average, with no pod scheduled.

Today, at critical hour, this is the GKE cloud status:

kubectl top nodes
NAME         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
gke-n1-s1-A   241m         25%    1149Mi          43%
gke-n1-s1-B   81m          8%     1261Mi          47%
gke-n1-s2-C   411m         21%    1609Mi          28%

but a top in node B, shows

top - 11:20:46 up 45 days, 20:58,  1 user,  load average: 1.66, 1.25, 1.13
Tasks: 128 total,   1 running, 127 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.0 us,  2.3 sy,  0.0 ni, 91.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3697.9 total,   1367.8 free,    629.6 used,   1700.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   2837.7 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1065 root      20   0  924936 117608  66164 S   3.3   3.1   1376:27 kubelet
   1008 root      20   0  806080  90228  26644 S   1.3   2.4 829:21.65 dockerd
2590758 root      20   0  136340  29056  20908 S   0.7   0.8  18:38.56 kube-dns
    443 root      20   0   36200  19736   5808 S   0.3   0.5   3:51.49 google_accounts
   1932 root      20   0  764164  82748  11676 S   0.3   2.2 387:52.03 ruby
      1 root      20   0  164932   7212   4904 S   0.0   0.2  18:03.44 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.09 kthreadd
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq
      7 root      20   0       0      0      0 S   0.0   0.0  14:55.03 ksoftirqd/0

EDIT 1: FINALLY LAST TEST:

1.- Create a pool with 1 node

gcloud container node-pools create testpool --cluster MYCLUSTER --num-nodes=1 --machine-type=n1-standard-1
NAME      MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
testpool  n1-standard-1  100           1.14.10-gke.36

2.- Drain the node and check node status

kubectl drain --ignore-daemonsets gke-MYCLUSTER-testpool-a84f3036-16lr

kubectl get nodes
gke-MYCLUSTER-testpool-a84f3036-16lr     Ready,SchedulingDisabled   <none>   2m3s   v1.14.10-gke.36

3.- Restart machine, wait and top

gcloud compute ssh gke-MYCLUSTER-testpool-a84f3036-16lr
sudo reboot

gcloud compute ssh gke-MYCLUSTER-testpool-a84f3036-16lr
top

top - 11:46:34 up 3 min,  1 user,  load average: 1.24, 0.98, 0.44
Tasks: 104 total,   1 running, 103 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.1 us,  1.0 sy,  0.0 ni, 95.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3697.9 total,   2071.3 free,    492.8 used,   1133.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   2964.2 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   1066 root      20   0  895804  99900  65136 S   2.1   2.6   0:04.28 kubelet
   1786 root      20   0  417288  74176  11660 S   2.1   2.0   0:03.13 ruby
   1009 root      20   0  812868  97168  26456 S   1.0   2.6   0:09.17 dockerd
      1 root      20   0   99184   6960   4920 S   0.0   0.2   0:02.25 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
      3 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H
      5 root      20   0       0      0      0 I   0.0   0.0   0:00.43 kworker/u2:0
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq
      7 root      20   0       0      0      0 S   0.0   0.0   0:00.08 ksoftirqd/0
      8 root      20   0       0      0      0 I   0.0   0.0   0:00.20 rcu_sched
      9 root      20   0       0      0      0 I   0.0   0.0   0:00.00 rcu_bh
     10 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0
     11 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 watchdog/0
     12 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0
     13 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kdevtmpfs
     14 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 netns
     15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 khungtaskd
     16 root      20   0       0      0      0 S   0.0   0.0   0:00.00 oom_reaper
     17 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 writeback

1.24 of load average without pod cutom pods ?

EDIT 2 Thanks @willrof. I tryed by using "toolbox", and run "atop", and "iotop" commands. I see nothing anormal but the load average is about (1 - 1.2). As you can see the CPU is doing "nothing" and the IO operations are near zero. Here are the results:

iotop
Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
      1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % systemd noresume noswap cros_efi
      2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
2591747 be/4 nobody      0.00 B/s    0.00 B/s  0.00 %  0.00 % monitor --source=kube-proxy:http://local~ng.googleapis.com/ --export-interval=120s
      4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
3399685 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % sudo systemd-nspawn --directory=/var/lib~/resolv.conf:/etc/resolv.conf --user=root
      6 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]
      7 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
      8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
      9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
     10 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
     11 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
     12 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [cpuhp/0]
     13 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kdevtmpfs]
     14 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [netns]
     15 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [khungtaskd]
     16 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [oom_reaper]
     17 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [writeback]
     18 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kcompactd0]
     19 be/7 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [khugepaged]
     20 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [crypto]
     21 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kintegrityd]
     22 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kblockd]
     23 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ata_sff]
     24 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdogd]
2590745 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % containerd-shim -namespace moby -workdir~runtime-root /var/run/docker/runtime-runc


atop

PRC | sys   14h12m |  user  41h11m | #proc    140 |  #trun      1 | #tslpi   544 | #tslpu     1  | #zombie    0 | clones 118e5  | #exit      0 |
CPU | sys       2% |  user      5% | irq       0% |  idle     93% | wait      0% | steal     0%  | guest     0% | curf 2.30GHz  | curscal   ?% |
CPL | avg1    1.17 |  avg5    1.17 | avg15   1.17 |               | csw 669768e4 |               | intr 26835e5 |               | numcpu     1 |
MEM | tot     3.6G |  free  221.1M | cache   2.1G |  buff  285.2M | slab  313.3M | shmem   2.2M  | vmbal   0.0M | hptot   0.0M  | hpuse   0.0M |
SWP | tot     0.0M |  free    0.0M |              |               |              |               |              | vmcom   6.4G  | vmlim   1.8G |
PAG | scan   54250 |  steal  37777 | stall      0 |               |              |               |              | swin       0  | swout      0 |
LVM |         dm-0 |  busy      0% | read    6747 |  write      0 | KiB/r     36 | KiB/w      0  | MBr/s    0.0 | MBw/s    0.0  | avio 2.00 ms |
DSK |          sda |  busy      0% | read   19322 |  write 5095e3 | KiB/r     37 | KiB/w      8  | MBr/s    0.0 | MBw/s    0.0  | avio 0.75 ms |
DSK |          sdc |  busy      0% | read     225 |  write    325 | KiB/r     24 | KiB/w  13315  | MBr/s    0.0 | MBw/s    0.0  | avio 1.75 ms |
DSK |          sdb |  busy      0% | read     206 |  write    514 | KiB/r     26 | KiB/w     10  | MBr/s    0.0 | MBw/s    0.0  | avio 0.93 ms |
NET | transport    |  tcpi 69466e3 | tcpo 68262e3 |  udpi  135509 | udpo  135593 | tcpao 4116e3  | tcppo 2797e3 | tcprs 738077  | udpie      0 |
NET | network      |  ipi 222967e3 | ipo 216603e3 |  ipfrw 1533e5 | deliv 6968e4 |               |              | icmpi  74445  | icmpo   6254 |
NET | vethf6a   0% |  pcki 40168e3 | pcko 39391e3 |  sp   10 Gbps | si   15 Kbps | so   43 Kbps  | erri       0 | erro       0  | drpo       0 |
NET | veth046   0% |  pcki 8800433 | pcko 9133058 |  sp   10 Gbps | si    2 Kbps | so    4 Kbps  | erri       0 | erro       0  | drpo       0 |
NET | vethe89   0% |  pcki   10923 | pcko   23560 |  sp   10 Gbps | si    0 Kbps | so    0 Kbps  | erri       0 | erro       0  | drpo       0 |
NET | veth647   0% |  pcki 2583709 | pcko 2845889 |  sp   10 Gbps | si    0 Kbps | so    0 Kbps  | erri       0 | erro       0  | drpo       0 |
NET | veth6be   0% |  pcki  374054 | pcko  448480 |  sp   10 Gbps | si    0 Kbps | so    0 Kbps  | erri       0 | erro       0  | drpo       0 |
NET | eth0    ---- |  pcki 12094e4 | pcko 11533e4 |  sp    0 Mbps | si  103 Kbps | so   56 Kbps  | erri       0 | erro       0  | drpo       0 |
NET | cbr0    ---- |  pcki 98061e3 | pcko 92356e3 |  sp    0 Mbps | si   36 Kbps | so   71 Kbps  | erri       0 | erro       0  | drpo       0 |
NET | lo      ---- |  pcki 9076898 | pcko 9076898 |  sp    0 Mbps | si    5 Kbps | so    5 Kbps  | erri       0 | erro       0  | drpo       0 |
                                                 *** system and process activity since boot ***

Anyone could help me ?

What can I do ?

Is this behaviour normal in GKE nodes without pods ?

Should I change to another Kubernetes provider ?

Thanks in advance.

Upvotes: 1

Views: 1743

Answers (1)

Juan Miguel
Juan Miguel

Reputation: 31

After crossing messages with google support, this seems a problem with the stable release version of google VM.

The last official stable version is v1.14.10-gke.36.

We have checked bad load performance since v.1.14.10-gke.27 (we don't go earlier).

We are waiting a response from Google product engineers about this. We checked out the last version available today "1.16.9-gke.2", and the load average is normal in iddle, about 0.15 and lower, but this is no a "stable" release.

If you create a cluster with gcloud command, it gives you the last "stable" and this is "v1.14.10-gke.36" today, so everybody using "v1.14.10-gke.X" should have this problem.

The solution is ...

a) Wait the official response from Google product engineers.

b) Move / update to other version of cluster / nodes (perhaps not stable).

EDIT. 2020/06/24. Google's Response

1- I have informed your feedback to our GKE product engineering team and they are able to reproduce the issue in gke version 1.14.10-gke-36 in cos and cos_containerd but the average load ubuntu and ubuntu_containerd have lower average load. So, our GKE product engineer suggested the quick workaround to upgrade the cluster and node pool to 1.15. For the permanent fix our GKE product team is working but I do not have any ETA to share as of now.

2- How to upgrade the cluster: for best practice I found a document[1], in this way you can upgrade your cluster with zero downtime. Please note while the master (cluster) is upgrade the workload is not impacted but we will not be able to reach the api server. So we can not deploy new workload or make any change or monitor the status during the upgrade. But we can make the cluster to regional cluster which has multiple master node. Also this document suggested two way to upgrade nodepool Rolling update and Migration with node pools. Now to address the PV and PVC, I have tested in my project and found during the nodepool upgrade the PVC is not deleted so as the PV is not deleted (though the reclaim policy defined as Delete). But I would suggest to take the backup of your disk (associated with PV) and recreate the PV with following the document[2].

3- lastly why the 1.14.10-gke.36 is default version? The default version is set and updated gradually and as of document[3] the last time it was set to 1.14.10-gke-36 on May 13, and this can be change in any next update. But we can define the gke cluster version manually.

Please let me know if you have query or feel like I have missed something here. And for 1.14.10-gke-36 issue update you can expect an update from me on Friday (June 26, 2020) 16:00 EDT time.

[1]- https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime

[2]- https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/preexisting-pd

[3]- https://cloud.google.com/kubernetes-engine/docs/release-notes#new_default_version

Upvotes: 1

Related Questions