Reputation: 479
I am trying to experiment a 2 node cluster (will scale up later once I stabilize) for mongodb. This is using EKS. The 2 nodes are running in two different aws zones. The descriptor is as follows:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mongod
labels:
name: mongo-repl
spec:
serviceName: mongodb-service
replicas: 2
selector:
matchLabels:
app: mongod
role: mongo
environment: test
template:
metadata:
labels:
app: mongod
role: mongo
environment: test
spec:
terminationGracePeriodSeconds: 15
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- ap-south-1a
- ap-south-1b
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mongod
- key: role
operator: In
values:
- mongo
- key: environment
operator: In
values:
- test
topologyKey: kubernetes.io/hostname
containers:
.....
The objective here is to NOT schedule another pod on the same node where already a pod with labels - app=mongod,role=mongo,environment=test is running
When I am deploying the spec, only 1 set of mongo pod is getting created on one node.
ubuntu@ip-192-170-0-18:~$ kubectl describe statefulset mongod
Name: mongod
Namespace: default
CreationTimestamp: Sun, 16 Feb 2020 16:44:16 +0000
Selector: app=mongod,environment=test,role=mongo
Labels: name=mongo-repl
Annotations: <none>
Replicas: 2 desired | 2 total
Update Strategy: OnDelete
Pods Status: 1 Running / 1 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=mongod
environment=test
role=mongo
Containers:
kubectl describe pod mongod-1
Node: <none>
Labels: app=mongod
controller-revision-hash=mongod-66f7c87bbb
environment=test
role=mongo
statefulset.kubernetes.io/pod-name=mongod-1
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
....
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 42s (x14 over 20m) default-scheduler 0/2 nodes are available: 1 Insufficient pods, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules.
Unable to figure out what is conflicting in the affinity specs. I'll really appreciate some insight here !
Edit on Feb/21 : Added information on new error below
Based on the suggestions, I have now scaled the worker nodes and started receiving more clear error message --
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 51s (x554 over 13h) default-scheduler 0/2 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict.
So the main issue now (after scaling up worker nodes) turns out to be --
1 node(s) had volume node affinity conflict
Posting below my whole configuration artifacts again:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mongod
labels:
name: mongo-repl
spec:
serviceName: mongodb-service
replicas: 2
selector:
matchLabels:
app: mongod
role: mongo
environment: test
template:
metadata:
labels:
app: mongod
role: mongo
environment: test
spec:
terminationGracePeriodSeconds: 15
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- ap-south-1a
- ap-south-1b
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mongod
- key: role
operator: In
values:
- mongo
- key: environment
operator: In
values:
- test
topologyKey: kubernetes.io/hostname
containers:
- name: mongod-container
.......
volumes:
- name: mongo-vol
persistentVolumeClaim:
claimName: mongo-pvc
PVC --
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongo-pvc
spec:
storageClassName: gp2-multi-az
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
PV --
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
name: db-volume-0
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: gp2-multi-az
awsElasticBlockStore:
volumeID: vol-06f12b1d6c5c93903
fsType: ext4
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
#- key: topology.kubernetes.io/zone
operator: In
values:
- ap-south-1a
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
name: db-volume-1
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: gp2-multi-az
awsElasticBlockStore:
volumeID: vol-090ab264d4747f131
fsType: ext4
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
#- key: topology.kubernetes.io/zone
operator: In
values:
- ap-south-1b
Storage Class --
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gp2-multi-az
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
type: gp2
fsType: ext4
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- ap-south-1a
- ap-south-1b
I don't want to opt for dynamic PVC.
As per @rabello's suggestion adding the below outputs --
kubectl get pods --show-labels
NAME READY STATUS RESTARTS AGE LABELS
mongod-0 1/1 Running 0 14h app=mongod,controller-revision-hash=mongod-5b4699fd85,environment=test,role=mongo,statefulset.kubernetes.io/pod-name=mongod-0
mongod-1 0/1 Pending 0 14h app=mongod,controller-revision-hash=mongod-5b4699fd85,environment=test,role=mongo,statefulset.kubernetes.io/pod-name=mongod-1
kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
ip-192-170-0-8.ap-south-1.compute.internal Ready <none> 14h v1.14.7-eks-1861c5 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=t3.small,beta.kubernetes.io/os=linux,eks.amazonaws.com/nodegroup-image=ami-07fd6cdebfd02ef6e,eks.amazonaws.com/nodegroup=trl_compact_prod_db_node_group,failure-domain.beta.kubernetes.io/region=ap-south-1,failure-domain.beta.kubernetes.io/zone=ap-south-1a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-192-170-0-8.ap-south-1.compute.internal,kubernetes.io/os=linux
ip-192-170-80-14.ap-south-1.compute.internal Ready <none> 14h v1.14.7-eks-1861c5 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=t3.small,beta.kubernetes.io/os=linux,eks.amazonaws.com/nodegroup-image=ami-07fd6cdebfd02ef6e,eks.amazonaws.com/nodegroup=trl_compact_prod_db_node_group,failure-domain.beta.kubernetes.io/region=ap-south-1,failure-domain.beta.kubernetes.io/zone=ap-south-1b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-192-170-80-14.ap-south-1.compute.internal,kubernetes.io/os=linux
Upvotes: 7
Views: 14131
Reputation: 761
EBS volumes are zonal. They can only be accessed by pods that are located in the same AZ as the volume. Your StatefulSet allows pods to be scheduled in multiple zones (ap-south-1a and ap-south-1b). Given your other constraints, the scheduler may be attempting to schedule a pod on node in different AZ than its volume. I would try confining your StatefulSet to a single AZ or use an operator to install Mongo.
Upvotes: 0