user630702
user630702

Reputation: 3167

GKE - Metrics-Server - HTTP probe failed with statuscode: 500

It works for sometime and then it crashes CrashLoopBackOff. When it works occasionally I get the Unauthorized error. After 5 to 10 minutes, it crashes.

Error from server (InternalError): an error on the server ("Internal Server Error: \"/apis/metrics.k8s.io/v1beta1/nodes\": Unauthorized") has prevented the request from succeeding (get nodes.metrics.k8s.io)

I'm using the latest version of metric-server.

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  27m                   default-scheduler  Successfully assigned kube-system/metrics-server-59ff97d56-xjbh4 to gke-test-test-node-pool-05539c92-26z1
  Normal   Created    20m (x3 over 27m)     kubelet            Created container metrics-server
  Normal   Started    20m (x3 over 27m)     kubelet            Started container metrics-server
  Warning  Unhealthy  20m (x7 over 21m)     kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy  20m (x8 over 21m)     kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    12m (x8 over 20m)     kubelet            Container metrics-server failed liveness probe, will be restarted
  Normal   Pulled     7m19s (x9 over 27m)   kubelet            Container image "k8s.gcr.io/metrics-server/metrics-server:v0.4.1" already present on machine
  Warning  BackOff    2m15s (x71 over 18m)  kubelet            Back-off restarting failed container

I tried changing settings like suggested by others answers but none of them work. Any other suggestions?

135a136,137
>         - --kubelet-insecure-tls
>         - --kubelet-preferred-address-types=InternalIP
151a154
>           initialDelaySeconds: 300

Upvotes: 2

Views: 11658

Answers (1)

PjoterS
PjoterS

Reputation: 14102

As OP didn't provide further information I've run multiple scenarios and I was able to reproduce this behavior.

Background

OP wants to use newest metrics-server version 0.4.1 - k8s.gcr.io/metrics-server/metrics-server:v0.4.1.

Please keep in mind that Google Kubernetes Engine is special Google Cloud Platform product which is integrated with other GCP features. It means that apart from being open source Kubernetes implementation, it also has some specific configurations and dependencies (not available in open source k8s) which make using those features easier (eg. Stackdriver).

Unlike kubernetes clusters built on top of Google Compute Engine, GKE is fully managed by Google.

Root cause

All GKE versions (stable, rapid channel, static version) are using metrics-server-v0.3.6 as it's integrated with other GCP features.

If you will deploy newest metrics-server v.0.4.1 you will be able to see that it changed default GKE configuration of serviceAccount, Roles, etc.

clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server configured
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader configured
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator configured
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server configured
service/metrics-server configured
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io configured

As those resources, were reconfigured you might get some Unauthorized errors.

Another issue is that new version have set Readiness and Liviness probes.

$ kubectl describe po metrics-server-59ff97d56-mp2v2 -n kube-system | grep Liveness:
    Liveness:       http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
$ kubectl describe po metrics-server-59ff97d56-mp2v2 -n kube-system | grep Readiness:
    Readiness:      http-get https://:https/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
$ kubectl describe po metrics-server-v0.3.6-64655c969-jd5gj -n kube-system | grep Liveness:
$ kubectl describe po metrics-server-v0.3.6-64655c969-jd5gj -n kube-system | grep Readiness:
$

Conclusion

If you would remove Readiness and Liveness probe from metrics-server-v.0.4.1 deployment YAML it will be deployed and pod will be in Running state, however IT IS HIGHLY NOT RECOMMENDED. It might disturb work on your cluster in the future or cause some unexpected situations.

If you want to use newest metrics-server version you should use Kubeadm with Google Compute Engine.

As additional information, you can rise a Feature Request on Public Issue Tracker to use newest metrics-server-v0.4.1 on GKE. You can do it here

Upvotes: 0

Related Questions