Michal
Michal

Reputation: 162

Elasticsearch pod readiness probe fails with "message": "readiness probe failed", "curl rc": "7"

I'm trying to deploy Elasticsearch on RKE cluster.

Following instructions with this tutorial.

https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html

Kube deployment is on VMs behind proxy.

Due to lack of provisioner I provisioned pv myself and this is not the problem.

The error I'm getting from the proble is as follows:

Readiness probe failed: {"timestamp": "2021-10-06T12:44:37+00:00", "message": "readiness probe failed", "curl_rc": "7"}

In addition if I curl on master node i get diffrent error

curl https://127.0.0.0:9200 curl: (56) Received HTTP code 503 from proxy after CONNECT

Inside the container i get bash-4.4# curl https://127.0.0.0:9200 curl: (7) Couldn't connect to server

also inside container: curl https://0.0.0.0:9200 curl: (56) Received HTTP code 503 from proxy after CONNECT

I established that readiness proble fails on performing curl command which is par of the script. /mnt/elastic-internal/scripts/readiness-probe-script.sh

I attach the scrip and the contents of the pod describe output:

script:

#!/usr/bin/env bash

# fail should be called as a last resort to help the user to understand why the probe failed
function fail {
  timestamp=$(date --iso-8601=seconds)
  echo "{\"timestamp\": \"${timestamp}\", \"message\": \"readiness probe failed\", "$1"}" | tee /proc/1/fd/2 2> /dev/null
  exit 1
}

labels="/mnt/elastic-internal/downward-api/labels"

version=""
if [[ -f "${labels}" ]]; then
  # get Elasticsearch version from the downward API
  version=$(grep "elasticsearch.k8s.elastic.co/version" ${labels} | cut -d '=' -f 2)
  # remove quotes
  version=$(echo "${version}" | tr -d '"')
fi

READINESS_PROBE_TIMEOUT=${READINESS_PROBE_TIMEOUT:=3}

# Check if PROBE_PASSWORD_PATH is set, otherwise fall back to its former name in 1.0.0.beta-1: PROBE_PASSWORD_FILE
if [[ -z "${PROBE_PASSWORD_PATH}" ]]; then
  probe_password_path="${PROBE_PASSWORD_FILE}"
else
  probe_password_path="${PROBE_PASSWORD_PATH}"
fi

# setup basic auth if credentials are available
if [ -n "${PROBE_USERNAME}" ] && [ -f "${probe_password_path}" ]; then
  PROBE_PASSWORD=$(<${probe_password_path})
  BASIC_AUTH="-u ${PROBE_USERNAME}:${PROBE_PASSWORD}"
else
  BASIC_AUTH=''
fi

# Check if we are using IPv6
if [[ $POD_IP =~ .*:.* ]]; then
  LOOPBACK="[::1]"
else
  LOOPBACK=127.0.0.1
fi

# request Elasticsearch on /
# we are turning globbing off to allow for unescaped [] in case of IPv6
ENDPOINT="${READINESS_PROBE_PROTOCOL:-https}://${LOOPBACK}:9200/"
status=$(curl -o /dev/null -w "%{http_code}" --max-time ${READINESS_PROBE_TIMEOUT} -XGET -g -s -k ${BASIC_AUTH} $ENDPOINT)
curl_rc=$?

if [[ ${curl_rc} -ne 0 ]]; then
  fail "\"curl_rc\": \"${curl_rc}\""
fi

# ready if status code 200, 503 is tolerable if ES version is 6.x
if [[ ${status} == "200" ]] || [[ ${status} == "503" && ${version:0:2} == "6." ]]; then
  exit 0
else
  fail " \"status\": \"${status}\", \"version\":\"${version}\" "
fi

The following is the describe pod output:

Name:         quickstart-es-default-0
Namespace:    default
Priority:     0
Node:         rke-worker-1/10.21.242.216
Start Time:   Wed, 06 Oct 2021 14:43:11 +0200
Labels:       common.k8s.elastic.co/type=elasticsearch
              controller-revision-hash=quickstart-es-default-666db95c77
              elasticsearch.k8s.elastic.co/cluster-name=quickstart
              elasticsearch.k8s.elastic.co/config-hash=2374451611
              elasticsearch.k8s.elastic.co/http-scheme=https
              elasticsearch.k8s.elastic.co/node-data=true
              elasticsearch.k8s.elastic.co/node-data_cold=true
              elasticsearch.k8s.elastic.co/node-data_content=true
              elasticsearch.k8s.elastic.co/node-data_hot=true
              elasticsearch.k8s.elastic.co/node-data_warm=true
              elasticsearch.k8s.elastic.co/node-ingest=true
              elasticsearch.k8s.elastic.co/node-master=true
              elasticsearch.k8s.elastic.co/node-ml=true
              elasticsearch.k8s.elastic.co/node-remote_cluster_client=true
              elasticsearch.k8s.elastic.co/node-transform=true
              elasticsearch.k8s.elastic.co/node-voting_only=false
              elasticsearch.k8s.elastic.co/statefulset-name=quickstart-es-default
              elasticsearch.k8s.elastic.co/version=7.15.0
              statefulset.kubernetes.io/pod-name=quickstart-es-default-0
Annotations:  cni.projectcalico.org/containerID: 1e03a07fc3a1cb37902231b69a5f0fcaed2d450137cb675c5dfb393af185a258
              cni.projectcalico.org/podIP: 10.42.2.7/32
              cni.projectcalico.org/podIPs: 10.42.2.7/32
              co.elastic.logs/module: elasticsearch
              update.k8s.elastic.co/timestamp: 2021-10-06T12:43:23.93263325Z
Status:       Running
IP:           10.42.2.7
IPs:
  IP:           10.42.2.7
Controlled By:  StatefulSet/quickstart-es-default
Init Containers:
  elastic-internal-init-filesystem:
    Container ID:  docker://cc72c63cb1bb5406a2edbcc0488065c06a130f00a73d2e38544cd7e9754fbc57
    Image:         docker.elastic.co/elasticsearch/elasticsearch:7.15.0
    Image ID:      docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1
    Port:          <none>
    Host Port:     <none>
    Command:
      bash
      -c
      /mnt/elastic-internal/scripts/prepare-fs.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 06 Oct 2021 14:43:20 +0200
      Finished:     Wed, 06 Oct 2021 14:43:42 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_IP:                  (v1:status.podIP)
      POD_NAME:               quickstart-es-default-0 (v1:metadata.name)
      NODE_NAME:               (v1:spec.nodeName)
      NAMESPACE:              default (v1:metadata.namespace)
      HEADLESS_SERVICE_NAME:  quickstart-es-default
    Mounts:
      /mnt/elastic-internal/downward-api from downward-api (ro)
      /mnt/elastic-internal/elasticsearch-bin-local from elastic-internal-elasticsearch-bin-local (rw)
      /mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
      /mnt/elastic-internal/elasticsearch-config-local from elastic-internal-elasticsearch-config-local (rw)
      /mnt/elastic-internal/elasticsearch-plugins-local from elastic-internal-elasticsearch-plugins-local (rw)
      /mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
      /mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
      /mnt/elastic-internal/transport-certificates from elastic-internal-transport-certificates (ro)
      /mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
      /mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
      /usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
      /usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
      /usr/share/elasticsearch/data from elasticsearch-data (rw)
      /usr/share/elasticsearch/logs from elasticsearch-logs (rw)
Containers:
  elasticsearch:
    Container ID:   docker://9fb879f9f0404a9997b5aa0ae915c788569c85abd008617447422ba5de559b54
    Image:          docker.elastic.co/elasticsearch/elasticsearch:7.15.0
    Image ID:       docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1
    Ports:          9200/TCP, 9300/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Wed, 06 Oct 2021 14:46:26 +0200
    Last State:     Terminated
      Reason:       Error
      Exit Code:    134
      Started:      Wed, 06 Oct 2021 14:43:46 +0200
      Finished:     Wed, 06 Oct 2021 14:46:22 +0200
    Ready:          False
    Restart Count:  1
    Limits:
      memory:  2Gi
    Requests:
      memory:   2Gi
    Readiness:  exec [bash -c /mnt/elastic-internal/scripts/readiness-probe-script.sh] delay=10s timeout=5s period=5s #success=1 #failure=3
    Environment:
      POD_IP:                     (v1:status.podIP)
      POD_NAME:                  quickstart-es-default-0 (v1:metadata.name)
      NODE_NAME:                  (v1:spec.nodeName)
      NAMESPACE:                 default (v1:metadata.namespace)
      PROBE_PASSWORD_PATH:       /mnt/elastic-internal/probe-user/elastic-internal-probe
      PROBE_USERNAME:            elastic-internal-probe
      READINESS_PROBE_PROTOCOL:  https
      HEADLESS_SERVICE_NAME:     quickstart-es-default
      NSS_SDB_USE_CACHE:         no
    Mounts:
      /mnt/elastic-internal/downward-api from downward-api (ro)
      /mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
      /mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
      /mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
      /mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
      /mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
      /usr/share/elasticsearch/bin from elastic-internal-elasticsearch-bin-local (rw)
      /usr/share/elasticsearch/config from elastic-internal-elasticsearch-config-local (rw)
      /usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
      /usr/share/elasticsearch/config/transport-certs from elastic-internal-transport-certificates (ro)
      /usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
      /usr/share/elasticsearch/data from elasticsearch-data (rw)
      /usr/share/elasticsearch/logs from elasticsearch-logs (rw)
      /usr/share/elasticsearch/plugins from elastic-internal-elasticsearch-plugins-local (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  elasticsearch-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  elasticsearch-data-quickstart-es-default-0
    ReadOnly:   false
  downward-api:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
  elastic-internal-elasticsearch-bin-local:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  elastic-internal-elasticsearch-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  quickstart-es-default-es-config
    Optional:    false
  elastic-internal-elasticsearch-config-local:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  elastic-internal-elasticsearch-plugins-local:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  elastic-internal-http-certificates:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  quickstart-es-http-certs-internal
    Optional:    false
  elastic-internal-probe-user:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  quickstart-es-internal-users
    Optional:    false
  elastic-internal-remote-certificate-authorities:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  quickstart-es-remote-ca
    Optional:    false
  elastic-internal-scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      quickstart-es-scripts
    Optional:  false
  elastic-internal-transport-certificates:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  quickstart-es-default-es-transport-certs
    Optional:    false
  elastic-internal-unicast-hosts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      quickstart-es-unicast-hosts
    Optional:  false
  elastic-internal-xpack-file-realm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  quickstart-es-xpack-file-realm
    Optional:    false
  elasticsearch-logs:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:   <unset>
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  22m                   default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled         22m                   default-scheduler  Successfully assigned default/quickstart-es-default-0 to rke-worker-1
  Normal   Pulled            21m                   kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.15.0" already present on machine
  Normal   Created           21m                   kubelet            Created container elastic-internal-init-filesystem
  Normal   Started           21m                   kubelet            Started container elastic-internal-init-filesystem
  Normal   Pulled            21m                   kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.15.0" already present on machine
  Normal   Created           21m                   kubelet            Created container elasticsearch
  Normal   Started           21m                   kubelet            Started container elasticsearch
  Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:43:57+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:02+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:07+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:12+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:17+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy         20m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:22+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy         20m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:27+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy         20m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:32+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy         20m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:37+00:00", "message": "readiness probe failed", "curl_rc": "7"}
  Warning  Unhealthy         115s (x223 over 20m)  kubelet            (combined from similar events): Readiness probe failed: {"timestamp": "2021-10-06T13:03:22+00:00", "message": "readiness probe failed", "curl_rc": "7"}

after probe restart I get following output:

{"type": "deprecation.elasticsearch", "timestamp": "2021-10-07T11:58:28,007Z", "level": "DEPRECATION", "component": "o.e.d.c.r.OperationRouting", "cluster.name": "quickstart", "node.name": "quickstart-es-default-0", "message": "searches will not be routed based on awareness attributes starting in version 8.0.0; to opt into this behaviour now please set the system property [es.search.ignore_awareness_attributes] to [true]", "key": "searches_not_routed_on_awareness_attributes" }
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fc63c3eb122, pid=7, tid=261
#
# JRE version: OpenJDK Runtime Environment Temurin-16.0.2+7 (16.0.2+7) (build 16.0.2+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-16.0.2+7 (16.0.2+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# J 711 c1 org.yaml.snakeyaml.scanner.Constant.has(I)Z (42 bytes) @ 0x00007fc63c3eb122 [0x00007fc63c3eb100+0x0000000000000022]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /usr/share/elasticsearch/core.7)
#
# An error report file with more information is saved as:
# logs/hs_err_pid7.log
Compiled method (c1)  333657 4806       3       org.yaml.snakeyaml.scanner.Constant::hasNo (15 bytes)
 total in heap  [0x00007fc63c50c010,0x00007fc63c50c7d0] = 1984
 relocation     [0x00007fc63c50c170,0x00007fc63c50c1f8] = 136
 main code      [0x00007fc63c50c200,0x00007fc63c50c620] = 1056
 stub code      [0x00007fc63c50c620,0x00007fc63c50c680] = 96
 oops           [0x00007fc63c50c680,0x00007fc63c50c688] = 8
 metadata       [0x00007fc63c50c688,0x00007fc63c50c6a8] = 32
 scopes data    [0x00007fc63c50c6a8,0x00007fc63c50c718] = 112
 scopes pcs     [0x00007fc63c50c718,0x00007fc63c50c7b8] = 160
 dependencies   [0x00007fc63c50c7b8,0x00007fc63c50c7c0] = 8
 nul chk table  [0x00007fc63c50c7c0,0x00007fc63c50c7d0] = 16
Compiled method (c1)  333676 4806       3       org.yaml.snakeyaml.scanner.Constant::hasNo (15 bytes)
 total in heap  [0x00007fc63c50c010,0x00007fc63c50c7d0] = 1984
 relocation     [0x00007fc63c50c170,0x00007fc63c50c1f8] = 136
 main code      [0x00007fc63c50c200,0x00007fc63c50c620] = 1056
 stub code      [0x00007fc63c50c620,0x00007fc63c50c680] = 96
 oops           [0x00007fc63c50c680,0x00007fc63c50c688] = 8
 metadata       [0x00007fc63c50c688,0x00007fc63c50c6a8] = 32
 scopes data    [0x00007fc63c50c6a8,0x00007fc63c50c718] = 112
 scopes pcs     [0x00007fc63c50c718,0x00007fc63c50c7b8] = 160
 dependencies   [0x00007fc63c50c7b8,0x00007fc63c50c7c0] = 8
 nul chk table  [0x00007fc63c50c7c0,0x00007fc63c50c7d0] = 16
Compiled method (c1)  333678 4812       3       org.yaml.snakeyaml.scanner.ScannerImpl::scanLineBreak (99 bytes)
 total in heap  [0x00007fc63c583990,0x00007fc63c584808] = 3704
 relocation     [0x00007fc63c583af0,0x00007fc63c583bf8] = 264
 main code      [0x00007fc63c583c00,0x00007fc63c584420] = 2080
 stub code      [0x00007fc63c584420,0x00007fc63c5844c0] = 160
 oops           [0x00007fc63c5844c0,0x00007fc63c5844c8] = 8
 metadata       [0x00007fc63c5844c8,0x00007fc63c584500] = 56
 scopes data    [0x00007fc63c584500,0x00007fc63c5845f0] = 240
 scopes pcs     [0x00007fc63c5845f0,0x00007fc63c5847b0] = 448
 dependencies   [0x00007fc63c5847b0,0x00007fc63c5847b8] = 8
 nul chk table  [0x00007fc63c5847b8,0x00007fc63c584808] = 80
Compiled method (c1)  333679 4693       2       java.lang.String::indexOf (7 bytes)
 total in heap  [0x00007fc63c6e0190,0x00007fc63c6e0568] = 984
 relocation     [0x00007fc63c6e02f0,0x00007fc63c6e0338] = 72
 main code      [0x00007fc63c6e0340,0x00007fc63c6e0480] = 320
 stub code      [0x00007fc63c6e0480,0x00007fc63c6e04d0] = 80
 metadata       [0x00007fc63c6e04d0,0x00007fc63c6e04e0] = 16
 scopes data    [0x00007fc63c6e04e0,0x00007fc63c6e0510] = 48
 scopes pcs     [0x00007fc63c6e0510,0x00007fc63c6e0560] = 80
 dependencies   [0x00007fc63c6e0560,0x00007fc63c6e0568] = 8
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

Upvotes: 2

Views: 9750

Answers (1)

Michal
Michal

Reputation: 162

The solution to my problem was so easy, that I did not behave.
I narrowed down the problem to the TLS handshakes failing.
The times on the nodes were different.
I synced the times and dates on all the nodes and all problems vanished.
It was due to that difference.
The proxy was bloking the services like NTP to sync the time.

NAME                      READY   STATUS    RESTARTS   AGE
quickstart-es-default-0   1/1     Running   0          3m2s
quickstart-es-default-1   1/1     Running   0          3m2s
quickstart-es-default-2   1/1     Running   0          3m2s

kubectl get elasticsearch
NAME         HEALTH   NODES   VERSION   PHASE   AGE
quickstart   green    3       7.15.0    Ready   3m21s

Upvotes: 2

Related Questions