user63898
user63898

Reputation: 30925

Hashicorp vault on k8s: getting error 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity

I'm deploying ha vault on k8s (EKS) and getting this error on one of the vault pods, which I think is causing other pods to fail also : This is the output of the kubectl get events:
search for : nodes are available: 1 Insufficient memory

26m         Normal    Created                        pod/vault-1                                 Created container vault
26m         Normal    Started                        pod/vault-1                                 Started container vault
26m         Normal    Pulled                         pod/vault-1                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m40s       Warning   BackOff                        pod/vault-1                                 Back-off restarting failed container
2m38s       Normal    Scheduled                      pod/vault-1                                 Successfully assigned vault-foo/vault-1 to ip-10-101-0-103.ec2.internal
2m35s       Normal    SuccessfulAttachVolume         pod/vault-1                                 AttachVolume.Attach succeeded for volume "pvc-acfc7e26-3616-4075-ab79-0c3f7b0f6470"
2m35s       Normal    SuccessfulAttachVolume         pod/vault-1                                 AttachVolume.Attach succeeded for volume "pvc-19d03d48-1de2-41f8-aadf-02d0a9f4bfbd"
48s         Normal    Pulled                         pod/vault-1                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
48s         Normal    Created                        pod/vault-1                                 Created container vault
99s         Normal    Started                        pod/vault-1                                 Started container vault
60s         Warning   BackOff                        pod/vault-1                                 Back-off restarting failed container
27m         Normal    TaintManagerEviction           pod/vault-2                                 Cancelling deletion of Pod vault-foo/vault-2
28m         Warning   FailedScheduling               pod/vault-2                                 0/4 nodes are available: 1 Insufficient memory, 4 Insufficient cpu.
28m         Warning   FailedScheduling               pod/vault-2                                 0/5 nodes are available: 1 Insufficient memory, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.
27m         Normal    Scheduled                      pod/vault-2                                 Successfully assigned vault-foo/vault-2 to ip-10-101-0-103.ec2.internal
27m         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
27m         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
27m         Normal    Pulling                        pod/vault-2                                 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
27m         Normal    Pulled                         pod/vault-2                                 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
26m         Normal    Created                        pod/vault-2                                 Created container vault
26m         Normal    Started                        pod/vault-2                                 Started container vault
26m         Normal    Pulled                         pod/vault-2                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m26s       Warning   BackOff                        pod/vault-2                                 Back-off restarting failed container
2m36s       Warning   FailedScheduling               pod/vault-2                                 0/7 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 4 Insufficient cpu.
114s        Warning   FailedScheduling               pod/vault-2                                 0/8 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 4 Insufficient cpu.
104s        Warning   FailedScheduling               pod/vault-2                                 0/9 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.
93s         Normal    Scheduled                      pod/vault-2                                 Successfully assigned vault-foo/vault-2 to ip-10-101-0-82.ec2.internal
88s         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
88s         Normal    SuccessfulAttachVolume         pod/vault-2                                 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
83s         Normal    Pulling                        pod/vault-2                                 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
81s         Normal    Pulled                         pod/vault-2                                 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
38s         Normal    Created                        pod/vault-2                                 Created container vault
37s         Normal    Started                        pod/vault-2                                 Started container vault
38s         Normal    Pulled                         pod/vault-2                                 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
4s          Warning   BackOff                        pod/vault-2                                 Back-off restarting failed container
2m38s       Normal    Scheduled                      pod/vault-agent-injector-d54bdc675-qwsmz    Successfully assigned vault-foo/vault-agent-injector-d54bdc675-qwsmz to ip-10-101-2-91.ec2.internal
2m37s       Normal    Pulling                        pod/vault-agent-injector-d54bdc675-qwsmz    Pulling image "hashicorp/vault-k8s:latest"
2m36s       Normal    Pulled                         pod/vault-agent-injector-d54bdc675-qwsmz    Successfully pulled image "hashicorp/vault-k8s:latest"
2m36s       Normal    Created                        pod/vault-agent-injector-d54bdc675-qwsmz    Created container sidecar-injector
2m35s       Normal    Started                        pod/vault-agent-injector-d54bdc675-qwsmz    Started container sidecar-injector
28m         Normal    Scheduled                      pod/vault-agent-injector-d54bdc675-wz9ws    Successfully assigned vault-foo/vault-agent-injector-d54bdc675-wz9ws to ip-10-101-0-87.ec2.internal
28m         Normal    Pulled                         pod/vault-agent-injector-d54bdc675-wz9ws    Container image "hashicorp/vault-k8s:latest" already present on machine
28m         Normal    Created                        pod/vault-agent-injector-d54bdc675-wz9ws    Created container sidecar-injector
28m         Normal    Started                        pod/vault-agent-injector-d54bdc675-wz9ws    Started container sidecar-injector
3m22s       Normal    Killing                        pod/vault-agent-injector-d54bdc675-wz9ws    Stopping container sidecar-injector
3m22s       Warning   Unhealthy                      pod/vault-agent-injector-d54bdc675-wz9ws    Readiness probe failed: Get https://10.101.0.73:8080/health/ready: dial tcp 10.101.0.73:8080: connect: connection refused
3m18s       Warning   Unhealthy                      pod/vault-agent-injector-d54bdc675-wz9ws    Liveness probe failed: Get https://10.101.0.73:8080/health/ready: dial tcp 10.101.0.73:8080: connect: no route to host
28m         Normal    SuccessfulCreate               replicaset/vault-agent-injector-d54bdc675   Created pod: vault-agent-injector-d54bdc675-wz9ws
2m38s       Normal    SuccessfulCreate               replicaset/vault-agent-injector-d54bdc675   Created pod: vault-agent-injector-d54bdc675-qwsmz
28m         Normal    ScalingReplicaSet              deployment/vault-agent-injector             Scaled up replica set vault-agent-injector-d54bdc675 to 1
2m38s       Normal    ScalingReplicaSet              deployment/vault-agent-injector             Scaled up replica set vault-agent-injector-d54bdc675 to 1
28m         Normal    EnsuringLoadBalancer           service/vault-ui                            Ensuring load balancer
28m         Normal    EnsuredLoadBalancer            service/vault-ui                            Ensured load balancer
26m         Normal    UpdatedLoadBalancer            service/vault-ui                            Updated load balancer with new hosts
3m24s       Normal    DeletingLoadBalancer           service/vault-ui                            Deleting load balancer
3m23s       Warning   PortNotAllocated               service/vault-ui                            Port 32476 is not allocated; repairing
3m23s       Warning   ClusterIPNotAllocated          service/vault-ui                            Cluster IP 172.20.216.143 is not allocated; repairing
3m22s       Warning   FailedToUpdateEndpointSlices   service/vault-ui                            Error updating Endpoint Slices for Service vault-foo/vault-ui: failed to update vault-ui-crtg4 EndpointSlice for Service vault-foo/vault-ui: Operation cannot be fulfilled on endpointslices.discovery.k8s.io "vault-ui-crtg4": the object has been modified; please apply your changes to the latest version and try again
3m16s       Warning   FailedToUpdateEndpoint         endpoints/vault-ui                          Failed to update endpoint vault-foo/vault-ui: Operation cannot be fulfilled on endpoints "vault-ui": the object has been modified; please apply your changes to the latest version and try again
2m52s       Normal    DeletedLoadBalancer            service/vault-ui                            Deleted load balancer
2m39s       Normal    EnsuringLoadBalancer           service/vault-ui                            Ensuring load balancer
2m36s       Normal    EnsuredLoadBalancer            service/vault-ui                            Ensured load balancer
96s         Normal    UpdatedLoadBalancer            service/vault-ui                            Updated load balancer with new hosts
28m         Normal    NoPods                         poddisruptionbudget/vault                   No matching pods found
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-0 in StatefulSet vault successful
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-1 in StatefulSet vault successful
28m         Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-2 in StatefulSet vault successful
2m40s       Normal    NoPods                         poddisruptionbudget/vault                   No matching pods found
2m38s       Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-0 in StatefulSet vault successful
2m38s       Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-1 in StatefulSet vault successful
2m38s       Normal    SuccessfulCreate               statefulset/vault                           create Pod vault-2 in StatefulSet vault successful

And this is my helm :

# Vault Helm Chart Value Overrides
global:
  enabled: true
  tlsDisable: false

injector:
  enabled: true
  # Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
  image:
    repository: "hashicorp/vault-k8s"
    tag: "latest"

  resources:
      requests:
        memory: 256Mi
        cpu: 250m
      limits:
        memory: 256Mi
        cpu: 250m

server:
  # Use the Enterprise Image
  image:
    repository: "hashicorp/vault-enterprise"
    tag: "1.5.0_ent"

  # These Resource Limits are in line with node requirements in the
  # Vault Reference Architecture for a Small Cluster
  resources:
    requests:
      memory: 8Gi
      cpu: 2000m
    limits:
      memory: 16Gi
      cpu: 2000m

  # For HA configuration and because we need to manually init the vault,
  # we need to define custom readiness/liveness Probe settings
  readinessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
  livenessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true"
    initialDelaySeconds: 60

  # extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
  # used to include variables required for auto-unseal.
  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca

  # extraVolumes is a list of extra volumes to mount. These will be exposed
  # to Vault in the path .
  #extraVolumes:
  #  - type: secret
  #    name: tls-server
  #  - type: secret
  #    name: tls-ca
  #  - type: secret
  #    name: kms-creds
  extraVolumes:
    - type: secret
      name: vault-server-tls   
  
  # This configures the Vault Statefulset to create a PVC for audit logs.
  # See https://www.vaultproject.io/docs/audit/index.html to know more
  auditStorage:
    enabled: true

  standalone:
    enabled: false

  # Run Vault in "HA" mode.
  ha:
    enabled: true
    replicas: 3
    raft:
      enabled: true
      setNodeId: true

      config: |
        ui = true
        listener "tcp" {
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
          tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          tls_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
        }

        storage "raft" {
          path = "/vault/data"
            retry_join {
            leader_api_addr = "http://vault-0.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          }
          retry_join {
            leader_api_addr = "http://vault-1.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          }
          retry_join {
            leader_api_addr = "http://vault-2.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
            leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
            leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
          }
        }

        service_registration "kubernetes" {}

# Vault UI
ui:
  enabled: true
  serviceType: "LoadBalancer"
  serviceNodePort: null
  externalPort: 8200

  # For Added Security, edit the below
  #loadBalancerSourceRanges:
  #   - < Your IP RANGE Ex. 10.0.0.0/16 >
  #   - < YOUR SINGLE IP Ex. 1.78.23.3/32 >


what did I not configure right?

Upvotes: 2

Views: 9064

Answers (1)

Wytrzymały Wiktor
Wytrzymały Wiktor

Reputation: 13898

There are several issue here and they are all represented by the error messages like:

0/9 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.

You got 9 Nodes but none of them are available for scheduling due to a different set of conditions. Note that each Node can be affected by multiple issues and so the numbers can add up to more than what you have on total nodes.

Let's break them down one by one:

  • Insufficient memory: Execute kubectl describe node <node-name> to check how much free memory is available there. Check the requests and limits of your pods. Note that Kubernetes will block the full amount of memory a pod requests regardless how much this pod uses.

  • Insufficient cpu: Analogical as above.

  • node(s) didn't match pod affinity/anti-affinity: Check your affinity/anti-affinity rules.

  • node(s) didn't satisfy existing pods anti-affinity rules: Same as above.

  • node(s) had volume node affinity conflict: Happens when pod was not able to be scheduled because it cannot connect to the volume from another Availability Zone. You can fix this by creating a storageclass for a single zone and than use that storageclass in your PVC.

  • node(s) were unschedulable: This is because the node is marked as Unschedulable. Which leads us to the next issue below:

  • node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate: This corresponds to the NodeCondition Ready = False. You can use kubectl describe node to check taints and kubectl taint nodes <node-name> <taint-name>- in order to remove them. Check the Taints and Tolerations for more details.

Also there is a GitHub thread with a similar issue that you may find useful.

Try checking/eliminating those issue one by one (starting from the first listed above) as they can make a "chain reaction" in some scenarios.

Upvotes: 7

Related Questions