Karan Alang
Karan Alang

Reputation: 1063

GCP : Kafka install on GKE - zookeeper not starting up

I've created a Kubernetes cluster(GKE) on GCP, and am trying to install Kafka on this (ref link - https://snourian.com/kafka-kubernetes-strimzi-part-1-creating-deploying-strimzi-kafka/)

Zookeeper is not starting up when I deploy the kafka cluster:

karan@cloudshell:~/strimzi-0.26.0 (versa-kafka-poc)$ kubectl get pv,pvc,pods -n kafka
NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS   REASON   AGE
persistentvolume/pvc-96957b25-f49b-4598-869c-a73b32325bc7   2Gi        RWO            Delete           Bound    kafka/data-my-cluster-zookeeper-0   standard                6m17s

NAME                                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/data-my-cluster-zookeeper-0   Bound    pvc-96957b25-f49b-4598-869c-a73b32325bc7   2Gi        RWO            standard       6m20s

NAME                                         READY   STATUS    RESTARTS   AGE
pod/my-cluster-zookeeper-0                   0/1     Pending   0          6m18s
pod/strimzi-cluster-operator-85bb4c6-cfl4p   1/1     Running   0          8m29s


aran@cloudshell:~/strimzi-0.26.0 (versa-kafka-poc)$ kc describe pod my-cluster-zookeeper-0 -n kafka
Name:           my-cluster-zookeeper-0
Namespace:      kafka
Priority:       0
Node:           <none>
Labels:         app.kubernetes.io/instance=my-cluster
                app.kubernetes.io/managed-by=strimzi-cluster-operator
                app.kubernetes.io/name=zookeeper
                app.kubernetes.io/part-of=strimzi-my-cluster
                controller-revision-hash=my-cluster-zookeeper-867c478fc4
                statefulset.kubernetes.io/pod-name=my-cluster-zookeeper-0
                strimzi.io/cluster=my-cluster
                strimzi.io/kind=Kafka
                strimzi.io/name=my-cluster-zookeeper
Annotations:    strimzi.io/cluster-ca-cert-generation: 0
                strimzi.io/generation: 0
                strimzi.io/logging-hash: 0f057cb0003c78f02978b83e4fabad5bd508680c
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/my-cluster-zookeeper
Containers:
  zookeeper:
    Image:       quay.io/strimzi/kafka:0.26.0-kafka-3.0.0
    Ports:       2888/TCP, 3888/TCP, 2181/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      /opt/kafka/zookeeper_run.sh
    Limits:
      cpu:     1500m
      memory:  2Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
    Readiness:  exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
    Environment:
      ZOOKEEPER_METRICS_ENABLED:         false
      ZOOKEEPER_SNAPSHOT_CHECK_ENABLED:  true
      STRIMZI_KAFKA_GC_LOG_ENABLED:      false
      DYNAMIC_HEAP_FRACTION:             0.75
      DYNAMIC_HEAP_MAX:                  2147483648
      ZOOKEEPER_CONFIGURATION:           tickTime=2000
                                         initLimit=5
                                         syncLimit=2
                                         autopurge.purgeInterval=1

    Mounts:
      /opt/kafka/cluster-ca-certs/ from cluster-ca-certs (rw)
      /opt/kafka/custom-config/ from zookeeper-metrics-and-logging (rw)
      /opt/kafka/zookeeper-node-certs/ from zookeeper-nodes (rw)
      /tmp from strimzi-tmp (rw)
      /var/lib/zookeeper from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cgm22 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-my-cluster-zookeeper-0
    ReadOnly:   false
  strimzi-tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  1Mi
  zookeeper-metrics-and-logging:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-cluster-zookeeper-config
    Optional:  false
  zookeeper-nodes:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  my-cluster-zookeeper-nodes
    Optional:    false
  cluster-ca-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  my-cluster-cluster-ca-cert
    Optional:    false
  kube-api-access-cgm22:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                 From                Message
  ----     ------             ----                ----                -------
  Warning  FailedScheduling   10m                 default-scheduler   0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling   40s (x10 over 10m)  default-scheduler   0/3 nodes are available: 3 Insufficient cpu.
  Normal   NotTriggerScaleUp  37s (x61 over 10m)  cluster-autoscaler  pod didn't trigger scale-up:

here is the yaml file used to create the cluster:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster #1
spec:
  kafka:
    version: 3.0.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      log.message.format.version: "3.0"
      inter.broker.protocol.version: "3.0"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 2Gi
        deleteClaim: false
    logging: #9
      type: inline
      loggers:
        kafka.root.logger.level: "INFO"
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 2Gi
      deleteClaim: false
    resources:
      requests:
        memory: 1Gi
        cpu: "1"
      limits:
        memory: 2Gi
        cpu: "1.5"
    logging:
      type: inline
      loggers:
        zookeeper.root.logger: "INFO"
  entityOperator: #11
    topicOperator: {}
    userOperator: {}

The PersistentVolume is showing as bound to the PersistentVolumeClaim, however the zookeeper is not starting up saying the nodes have insufficient CPU.

Any pointer on what needs to be done ?

cpu in 2 of the 3 nodes have limit - 0%

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests     Limits
  --------                   --------     ------
  cpu                        483m (51%)   0 (0%)
  memory                     410Mi (14%)  890Mi (31%)
  ephemeral-storage          0 (0%)       0 (0%)
  hugepages-1Gi              0 (0%)       0 (0%)
  hugepages-2Mi              0 (0%)       0 (0%)
Edit cancelled, no changes made.          0


3rd node :

 Resource                   Requests         Limits
  --------                   --------         ------
  cpu                        511m (54%)       1143m (121%)
  memory                     868783744 (29%)  1419Mi (50%)


kc describe pod my-cluster-zookeeper-0 -n kafka


karan@cloudshell:~ (versa-kafka-poc)$ kc describe pod my-cluster-zookeeper-0 -n kafka
Name:           my-cluster-zookeeper-0
Namespace:      kafka
Priority:       0
Node:           <none>
Labels:         app.kubernetes.io/instance=my-cluster
                app.kubernetes.io/managed-by=strimzi-cluster-operator
                app.kubernetes.io/name=zookeeper
                app.kubernetes.io/part-of=strimzi-my-cluster
                controller-revision-hash=my-cluster-zookeeper-867c478fc4
                statefulset.kubernetes.io/pod-name=my-cluster-zookeeper-0
                strimzi.io/cluster=my-cluster
                strimzi.io/kind=Kafka
                strimzi.io/name=my-cluster-zookeeper
Annotations:    strimzi.io/cluster-ca-cert-generation: 0
                strimzi.io/generation: 0
                strimzi.io/logging-hash: 0f057cb0003c78f02978b83e4fabad5bd508680c
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/my-cluster-zookeeper
Containers:
  zookeeper:
    Image:       quay.io/strimzi/kafka:0.26.0-kafka-3.0.0
    Ports:       2888/TCP, 3888/TCP, 2181/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      /opt/kafka/zookeeper_run.sh
    Limits:
      cpu:     1500m
      memory:  2Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
    Readiness:  exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
    Environment:
      ZOOKEEPER_METRICS_ENABLED:         false
      ZOOKEEPER_SNAPSHOT_CHECK_ENABLED:  true
      STRIMZI_KAFKA_GC_LOG_ENABLED:      false
      DYNAMIC_HEAP_FRACTION:             0.75
      DYNAMIC_HEAP_MAX:                  2147483648
      ZOOKEEPER_CONFIGURATION:           tickTime=2000
                                         initLimit=5
                                         syncLimit=2
                                         autopurge.purgeInterval=1

    Mounts:
      /opt/kafka/cluster-ca-certs/ from cluster-ca-certs (rw)
      /opt/kafka/custom-config/ from zookeeper-metrics-and-logging (rw)
      /opt/kafka/zookeeper-node-certs/ from zookeeper-nodes (rw)
      /tmp from strimzi-tmp (rw)
      /var/lib/zookeeper from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cgm22 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-my-cluster-zookeeper-0
    ReadOnly:   false
  strimzi-tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  1Mi
  zookeeper-metrics-and-logging:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-cluster-zookeeper-config
    Optional:  false
  zookeeper-nodes:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  my-cluster-zookeeper-nodes
    Optional:    false
  cluster-ca-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  my-cluster-cluster-ca-cert
    Optional:    false
  kube-api-access-cgm22:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                     From                Message
  ----     ------             ----                    ----                -------
  Warning  FailedScheduling   5h27m                   default-scheduler   0/3 nodes are available: 3 Insufficient cpu.
  Normal   NotTriggerScaleUp  28m (x1771 over 5h26m)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added):
  Normal   NotTriggerScaleUp  4m17s (x91 over 19m)    cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 max node group size reached
  Warning  FailedScheduling   80s (x19 over 20m)      default-scheduler   0/3 nodes are available: 3 Insufficient cpu.

Upvotes: 0

Views: 2108

Answers (1)

Khaja Shaik
Khaja Shaik

Reputation: 136

A pod can't be scheduled when it requests more cpu than you have in your cluster. If your existing pods have already consumed the total CPU then you can't schedule more pods unless some of your existing pods are killed by the time you request to schedule a new pod. Here is a simple equation can be followed in Horizontal Pod Autoscaler (HPA): RESOURCE REQUEST CPU * HPA MAX PODS <= Total Kubernetes CPU

Use kubectl describe node xxxx to check each node. You'll probably find that the CPU usage on the node is too high e.g. 80% in your own case. You may need to delete some resources from the node (e.g. any unused pods that aren't required) in order to successfully schedule new pods onto the node. Refer link for information on Insufficient CPU.

Refer Fixing – pod has unbound immediate persistentvolumeclaims and stackpost for information on pod has unbound immediate PersistentVolumeClaims.

Upvotes: 0

Related Questions