Reputation: 1063
I've created a Kubernetes cluster(GKE) on GCP, and am trying to install Kafka on this (ref link - https://snourian.com/kafka-kubernetes-strimzi-part-1-creating-deploying-strimzi-kafka/)
Zookeeper is not starting up when I deploy the kafka cluster:
karan@cloudshell:~/strimzi-0.26.0 (versa-kafka-poc)$ kubectl get pv,pvc,pods -n kafka
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-96957b25-f49b-4598-869c-a73b32325bc7 2Gi RWO Delete Bound kafka/data-my-cluster-zookeeper-0 standard 6m17s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-my-cluster-zookeeper-0 Bound pvc-96957b25-f49b-4598-869c-a73b32325bc7 2Gi RWO standard 6m20s
NAME READY STATUS RESTARTS AGE
pod/my-cluster-zookeeper-0 0/1 Pending 0 6m18s
pod/strimzi-cluster-operator-85bb4c6-cfl4p 1/1 Running 0 8m29s
aran@cloudshell:~/strimzi-0.26.0 (versa-kafka-poc)$ kc describe pod my-cluster-zookeeper-0 -n kafka
Name: my-cluster-zookeeper-0
Namespace: kafka
Priority: 0
Node: <none>
Labels: app.kubernetes.io/instance=my-cluster
app.kubernetes.io/managed-by=strimzi-cluster-operator
app.kubernetes.io/name=zookeeper
app.kubernetes.io/part-of=strimzi-my-cluster
controller-revision-hash=my-cluster-zookeeper-867c478fc4
statefulset.kubernetes.io/pod-name=my-cluster-zookeeper-0
strimzi.io/cluster=my-cluster
strimzi.io/kind=Kafka
strimzi.io/name=my-cluster-zookeeper
Annotations: strimzi.io/cluster-ca-cert-generation: 0
strimzi.io/generation: 0
strimzi.io/logging-hash: 0f057cb0003c78f02978b83e4fabad5bd508680c
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/my-cluster-zookeeper
Containers:
zookeeper:
Image: quay.io/strimzi/kafka:0.26.0-kafka-3.0.0
Ports: 2888/TCP, 3888/TCP, 2181/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/opt/kafka/zookeeper_run.sh
Limits:
cpu: 1500m
memory: 2Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Environment:
ZOOKEEPER_METRICS_ENABLED: false
ZOOKEEPER_SNAPSHOT_CHECK_ENABLED: true
STRIMZI_KAFKA_GC_LOG_ENABLED: false
DYNAMIC_HEAP_FRACTION: 0.75
DYNAMIC_HEAP_MAX: 2147483648
ZOOKEEPER_CONFIGURATION: tickTime=2000
initLimit=5
syncLimit=2
autopurge.purgeInterval=1
Mounts:
/opt/kafka/cluster-ca-certs/ from cluster-ca-certs (rw)
/opt/kafka/custom-config/ from zookeeper-metrics-and-logging (rw)
/opt/kafka/zookeeper-node-certs/ from zookeeper-nodes (rw)
/tmp from strimzi-tmp (rw)
/var/lib/zookeeper from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cgm22 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-my-cluster-zookeeper-0
ReadOnly: false
strimzi-tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Mi
zookeeper-metrics-and-logging:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-cluster-zookeeper-config
Optional: false
zookeeper-nodes:
Type: Secret (a volume populated by a Secret)
SecretName: my-cluster-zookeeper-nodes
Optional: false
cluster-ca-certs:
Type: Secret (a volume populated by a Secret)
SecretName: my-cluster-cluster-ca-cert
Optional: false
kube-api-access-cgm22:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 10m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 40s (x10 over 10m) default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
Normal NotTriggerScaleUp 37s (x61 over 10m) cluster-autoscaler pod didn't trigger scale-up:
here is the yaml file used to create the cluster:
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster #1
spec:
kafka:
version: 3.0.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 1
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
log.message.format.version: "3.0"
inter.broker.protocol.version: "3.0"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 2Gi
deleteClaim: false
logging: #9
type: inline
loggers:
kafka.root.logger.level: "INFO"
zookeeper:
replicas: 1
storage:
type: persistent-claim
size: 2Gi
deleteClaim: false
resources:
requests:
memory: 1Gi
cpu: "1"
limits:
memory: 2Gi
cpu: "1.5"
logging:
type: inline
loggers:
zookeeper.root.logger: "INFO"
entityOperator: #11
topicOperator: {}
userOperator: {}
The PersistentVolume is showing as bound to the PersistentVolumeClaim, however the zookeeper is not starting up saying the nodes have insufficient CPU.
Any pointer on what needs to be done ?
cpu in 2 of the 3 nodes have limit - 0%
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 483m (51%) 0 (0%)
memory 410Mi (14%) 890Mi (31%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Edit cancelled, no changes made. 0
3rd node :
Resource Requests Limits
-------- -------- ------
cpu 511m (54%) 1143m (121%)
memory 868783744 (29%) 1419Mi (50%)
kc describe pod my-cluster-zookeeper-0 -n kafka
karan@cloudshell:~ (versa-kafka-poc)$ kc describe pod my-cluster-zookeeper-0 -n kafka
Name: my-cluster-zookeeper-0
Namespace: kafka
Priority: 0
Node: <none>
Labels: app.kubernetes.io/instance=my-cluster
app.kubernetes.io/managed-by=strimzi-cluster-operator
app.kubernetes.io/name=zookeeper
app.kubernetes.io/part-of=strimzi-my-cluster
controller-revision-hash=my-cluster-zookeeper-867c478fc4
statefulset.kubernetes.io/pod-name=my-cluster-zookeeper-0
strimzi.io/cluster=my-cluster
strimzi.io/kind=Kafka
strimzi.io/name=my-cluster-zookeeper
Annotations: strimzi.io/cluster-ca-cert-generation: 0
strimzi.io/generation: 0
strimzi.io/logging-hash: 0f057cb0003c78f02978b83e4fabad5bd508680c
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/my-cluster-zookeeper
Containers:
zookeeper:
Image: quay.io/strimzi/kafka:0.26.0-kafka-3.0.0
Ports: 2888/TCP, 3888/TCP, 2181/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/opt/kafka/zookeeper_run.sh
Limits:
cpu: 1500m
memory: 2Gi
Requests:
cpu: 1
memory: 1Gi
Liveness: exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [/opt/kafka/zookeeper_healthcheck.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Environment:
ZOOKEEPER_METRICS_ENABLED: false
ZOOKEEPER_SNAPSHOT_CHECK_ENABLED: true
STRIMZI_KAFKA_GC_LOG_ENABLED: false
DYNAMIC_HEAP_FRACTION: 0.75
DYNAMIC_HEAP_MAX: 2147483648
ZOOKEEPER_CONFIGURATION: tickTime=2000
initLimit=5
syncLimit=2
autopurge.purgeInterval=1
Mounts:
/opt/kafka/cluster-ca-certs/ from cluster-ca-certs (rw)
/opt/kafka/custom-config/ from zookeeper-metrics-and-logging (rw)
/opt/kafka/zookeeper-node-certs/ from zookeeper-nodes (rw)
/tmp from strimzi-tmp (rw)
/var/lib/zookeeper from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cgm22 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-my-cluster-zookeeper-0
ReadOnly: false
strimzi-tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Mi
zookeeper-metrics-and-logging:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: my-cluster-zookeeper-config
Optional: false
zookeeper-nodes:
Type: Secret (a volume populated by a Secret)
SecretName: my-cluster-zookeeper-nodes
Optional: false
cluster-ca-certs:
Type: Secret (a volume populated by a Secret)
SecretName: my-cluster-cluster-ca-cert
Optional: false
kube-api-access-cgm22:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5h27m default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
Normal NotTriggerScaleUp 28m (x1771 over 5h26m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added):
Normal NotTriggerScaleUp 4m17s (x91 over 19m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 max node group size reached
Warning FailedScheduling 80s (x19 over 20m) default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
Upvotes: 0
Views: 2108
Reputation: 136
A pod can't be scheduled when it requests more cpu than you have in your cluster. If your existing pods have already consumed the total CPU then you can't schedule more pods unless some of your existing pods are killed by the time you request to schedule a new pod. Here is a simple equation can be followed in Horizontal Pod Autoscaler (HPA): RESOURCE REQUEST CPU * HPA MAX PODS <= Total Kubernetes CPU
Use kubectl describe node xxxx to check each node. You'll probably find that the CPU usage on the node is too high e.g. 80% in your own case. You may need to delete some resources from the node (e.g. any unused pods that aren't required) in order to successfully schedule new pods onto the node. Refer link for information on Insufficient CPU.
Refer Fixing – pod has unbound immediate persistentvolumeclaims and stackpost for information on pod has unbound immediate PersistentVolumeClaims.
Upvotes: 0