workaround
workaround

Reputation: 528

Consul syncCatalog on k8s keep falling into CrashLoopBackOff

I am deploying a consul cluster on k8s version 1.9:

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:21:50Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3+coreos.0", GitCommit:"f588569ed1bd4a6c986205dd0d7b04da4ab1a3b6", GitTreeState:"clean", BuildDate:"2018-02-10T01:42:55Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

using hashicorp/consul-k8s:0.11.0 for syncCatalog:

Here is my SyncCatalog Deployment description:

Namespace:              consul-presentation
CreationTimestamp:      Sun, 29 Mar 2020 20:22:49 +0300
Labels:                 app=consul
                        chart=consul-helm
                        heritage=Tiller
                        release=consul-presentation
Annotations:            deployment.kubernetes.io/revision=1
Selector:               app=consul,chart=consul-helm,component=sync-catalog,release=consul-presentation
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=consul
                    chart=consul-helm
                    component=sync-catalog
                    release=consul-presentation
  Annotations:      consul.hashicorp.com/connect-inject=false
  Service Account:  consul-presentation-consul-sync-catalog
  Containers:
   consul-sync-catalog:
    Image:  hashicorp/consul-k8s:0.11.0
    Port:   <none>
    Command:
      /bin/sh
      -ec
      consul-k8s sync-catalog \
  -k8s-default-sync=true \
  -consul-domain=consul \
  -k8s-write-namespace=${NAMESPACE} \
  -node-port-sync-type=ExternalFirst \
  -log-level=debug \
  -add-k8s-namespace-suffix \

    Liveness:   http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
    Readiness:  http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
    Environment:
      HOST_IP:            (v1:status.hostIP)
      NAMESPACE:          (v1:metadata.namespace)
      CONSUL_HTTP_ADDR:  http://consul-presentation.test:8500
    Mounts:              <none>
  Volumes:               <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    ReplicaSetUpdated
OldReplicaSets:  <none>
NewReplicaSet:   consul-presentation-consul-sync-catalog-66b5756486 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  1m    deployment-controller  Scaled up replica set consul-presentation-consul-sync-catalog-66b5756486 to 1

And here is the description of the unhealthy pod:

kubectl describe pod consul-presentation-consul-sync-catalog-66b5756486-2h2s6 -n consul-presentation                                                            
Name:           consul-presentation-consul-sync-catalog-66b5756486-2h2s6
Namespace:      consul-presentation
Node:           k8s-k4.test/10.99.1.10
Start Time:     Sun, 29 Mar 2020 20:22:49 +0300
Labels:         app=consul
                chart=consul-helm
                component=sync-catalog
                pod-template-hash=2261312042
                release=consul-presentation
Annotations:    consul.hashicorp.com/connect-inject=false
Status:         Running
IP:             10.195.5.53
Controlled By:  ReplicaSet/consul-presentation-consul-sync-catalog-66b5756486
Containers:
  consul-sync-catalog:
    Container ID:  docker://4f0c65a7be5f9b07cae51d798c532a066fb0784b28a7610dfe4f1a15a2fa5a7c
    Image:         hashicorp/consul-k8s:0.11.0
    Image ID:      docker-pullable://hashicorp/consul-k8s@sha256:8be1598ad3e71323509727162f20ed9c140c8cf09d5fa3dc351aad03ec2b0b70
    Port:          <none>
    Command:
      /bin/sh
      -ec
      consul-k8s sync-catalog \
  -k8s-default-sync=true \
  -consul-domain=consul \
  -k8s-write-namespace=${NAMESPACE} \
  -node-port-sync-type=ExternalFirst \
  -log-level=debug \
  -add-k8s-namespace-suffix \

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Sun, 29 Mar 2020 20:28:19 +0300
      Finished:     Sun, 29 Mar 2020 20:28:56 +0300
    Ready:          False
    Restart Count:  6
    Liveness:       http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
    Readiness:      http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
    Environment:
      HOST_IP:            (v1:status.hostIP)
      NAMESPACE:         consul-presentation (v1:metadata.namespace)
      CONSUL_HTTP_ADDR:  http://consul-presentation.test:8500
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from consul-presentation-consul-sync-catalog-token-jxw26 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  consul-presentation-consul-sync-catalog-token-jxw26:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  consul-presentation-consul-sync-catalog-token-jxw26
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age               From                      Message
  ----     ------                 ----              ----                      -------
  Normal   Scheduled              7m                default-scheduler         Successfully assigned consul-presentation-consul-sync-catalog-66b5756486-2h2s6 to k8s-k4.test
  Normal   SuccessfulMountVolume  7m                kubelet, k8s-k4.test  MountVolume.SetUp succeeded for volume "consul-presentation-consul-sync-catalog-token-jxw26"
  Normal   Pulled                 6m (x2 over 7m)   kubelet, k8s-k4.test  Container image "hashicorp/consul-k8s:0.11.0" already present on machine
  Normal   Created                6m (x2 over 7m)   kubelet, k8s-k4.test  Created container
  Normal   Started                6m (x2 over 7m)   kubelet, k8s-k4.test  Started container
  Normal   Killing                6m                kubelet, k8s-k4.test  Killing container with id docker://consul-sync-catalog:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy              6m (x4 over 6m)   kubelet, k8s-k4.test  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy              6m (x13 over 7m)  kubelet, k8s-k4.test  Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff                2m (x6 over 3m)   kubelet, k8s-k4.test  Back-off restarting failed container

I have tried the default trial as described in this helm chart: https://github.com/hashicorp/consul-helm

The only difference is I use ClusterIPs and ingresses which cannot have something to do with the health off a pod.

Any ideas?

Upvotes: 0

Views: 827

Answers (2)

Iryna Shustava
Iryna Shustava

Reputation: 46

The liveness probe failing is telling you that the sync-catalog process cannot talk to Consul. Here is how the liveness/readiness probe is implemented in consul-k8s.

It looks like the Consul address you're providing to the sync-catalog process is http://consul-presentation.test:8500. Is this an external Consul server? Is it running and reachable from the pods on Kubernetes?

Also, are you deploying Consul clients on k8s? In the official Helm chart sync-catalog talks to the Consul clients deployed as a daemonset via hostIP.

Upvotes: 3

workaround
workaround

Reputation: 528

When using k8s ingresses with ClusterIPs the consul address should be set to the ingress host, as it is actually exposed, without the port. That means that the corresponding part of the k8s deployment should be like that:

Liveness:   http-get http://:8080/health/ready delay=30s timeout=5s period=5s #success=1 #failure=3
Readiness:  http-get http://:8080/health/ready delay=10s timeout=5s period=5s #success=1 #failure=5
Environment:
  HOST_IP:            (v1:status.hostIP)
  NAMESPACE:          (v1:metadata.namespace)
  CONSUL_HTTP_ADDR:  http://{INGRESS HOST}

Upvotes: 0

Related Questions