Reputation: 162
I'm trying to deploy Elasticsearch on RKE cluster.
Following instructions with this tutorial.
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html
Kube deployment is on VMs behind proxy.
Due to lack of provisioner I provisioned pv myself and this is not the problem.
The error I'm getting from the proble is as follows:
Readiness probe failed: {"timestamp": "2021-10-06T12:44:37+00:00", "message": "readiness probe failed", "curl_rc": "7"}
In addition if I curl on master node i get diffrent error
curl https://127.0.0.0:9200 curl: (56) Received HTTP code 503 from proxy after CONNECT
Inside the container i get bash-4.4# curl https://127.0.0.0:9200 curl: (7) Couldn't connect to server
also inside container: curl https://0.0.0.0:9200 curl: (56) Received HTTP code 503 from proxy after CONNECT
I established that readiness proble fails on performing curl command which is par of the script. /mnt/elastic-internal/scripts/readiness-probe-script.sh
I attach the scrip and the contents of the pod describe output:
script:
#!/usr/bin/env bash
# fail should be called as a last resort to help the user to understand why the probe failed
function fail {
timestamp=$(date --iso-8601=seconds)
echo "{\"timestamp\": \"${timestamp}\", \"message\": \"readiness probe failed\", "$1"}" | tee /proc/1/fd/2 2> /dev/null
exit 1
}
labels="/mnt/elastic-internal/downward-api/labels"
version=""
if [[ -f "${labels}" ]]; then
# get Elasticsearch version from the downward API
version=$(grep "elasticsearch.k8s.elastic.co/version" ${labels} | cut -d '=' -f 2)
# remove quotes
version=$(echo "${version}" | tr -d '"')
fi
READINESS_PROBE_TIMEOUT=${READINESS_PROBE_TIMEOUT:=3}
# Check if PROBE_PASSWORD_PATH is set, otherwise fall back to its former name in 1.0.0.beta-1: PROBE_PASSWORD_FILE
if [[ -z "${PROBE_PASSWORD_PATH}" ]]; then
probe_password_path="${PROBE_PASSWORD_FILE}"
else
probe_password_path="${PROBE_PASSWORD_PATH}"
fi
# setup basic auth if credentials are available
if [ -n "${PROBE_USERNAME}" ] && [ -f "${probe_password_path}" ]; then
PROBE_PASSWORD=$(<${probe_password_path})
BASIC_AUTH="-u ${PROBE_USERNAME}:${PROBE_PASSWORD}"
else
BASIC_AUTH=''
fi
# Check if we are using IPv6
if [[ $POD_IP =~ .*:.* ]]; then
LOOPBACK="[::1]"
else
LOOPBACK=127.0.0.1
fi
# request Elasticsearch on /
# we are turning globbing off to allow for unescaped [] in case of IPv6
ENDPOINT="${READINESS_PROBE_PROTOCOL:-https}://${LOOPBACK}:9200/"
status=$(curl -o /dev/null -w "%{http_code}" --max-time ${READINESS_PROBE_TIMEOUT} -XGET -g -s -k ${BASIC_AUTH} $ENDPOINT)
curl_rc=$?
if [[ ${curl_rc} -ne 0 ]]; then
fail "\"curl_rc\": \"${curl_rc}\""
fi
# ready if status code 200, 503 is tolerable if ES version is 6.x
if [[ ${status} == "200" ]] || [[ ${status} == "503" && ${version:0:2} == "6." ]]; then
exit 0
else
fail " \"status\": \"${status}\", \"version\":\"${version}\" "
fi
The following is the describe pod output:
Name: quickstart-es-default-0
Namespace: default
Priority: 0
Node: rke-worker-1/10.21.242.216
Start Time: Wed, 06 Oct 2021 14:43:11 +0200
Labels: common.k8s.elastic.co/type=elasticsearch
controller-revision-hash=quickstart-es-default-666db95c77
elasticsearch.k8s.elastic.co/cluster-name=quickstart
elasticsearch.k8s.elastic.co/config-hash=2374451611
elasticsearch.k8s.elastic.co/http-scheme=https
elasticsearch.k8s.elastic.co/node-data=true
elasticsearch.k8s.elastic.co/node-data_cold=true
elasticsearch.k8s.elastic.co/node-data_content=true
elasticsearch.k8s.elastic.co/node-data_hot=true
elasticsearch.k8s.elastic.co/node-data_warm=true
elasticsearch.k8s.elastic.co/node-ingest=true
elasticsearch.k8s.elastic.co/node-master=true
elasticsearch.k8s.elastic.co/node-ml=true
elasticsearch.k8s.elastic.co/node-remote_cluster_client=true
elasticsearch.k8s.elastic.co/node-transform=true
elasticsearch.k8s.elastic.co/node-voting_only=false
elasticsearch.k8s.elastic.co/statefulset-name=quickstart-es-default
elasticsearch.k8s.elastic.co/version=7.15.0
statefulset.kubernetes.io/pod-name=quickstart-es-default-0
Annotations: cni.projectcalico.org/containerID: 1e03a07fc3a1cb37902231b69a5f0fcaed2d450137cb675c5dfb393af185a258
cni.projectcalico.org/podIP: 10.42.2.7/32
cni.projectcalico.org/podIPs: 10.42.2.7/32
co.elastic.logs/module: elasticsearch
update.k8s.elastic.co/timestamp: 2021-10-06T12:43:23.93263325Z
Status: Running
IP: 10.42.2.7
IPs:
IP: 10.42.2.7
Controlled By: StatefulSet/quickstart-es-default
Init Containers:
elastic-internal-init-filesystem:
Container ID: docker://cc72c63cb1bb5406a2edbcc0488065c06a130f00a73d2e38544cd7e9754fbc57
Image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1
Port: <none>
Host Port: <none>
Command:
bash
-c
/mnt/elastic-internal/scripts/prepare-fs.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 06 Oct 2021 14:43:20 +0200
Finished: Wed, 06 Oct 2021 14:43:42 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_IP: (v1:status.podIP)
POD_NAME: quickstart-es-default-0 (v1:metadata.name)
NODE_NAME: (v1:spec.nodeName)
NAMESPACE: default (v1:metadata.namespace)
HEADLESS_SERVICE_NAME: quickstart-es-default
Mounts:
/mnt/elastic-internal/downward-api from downward-api (ro)
/mnt/elastic-internal/elasticsearch-bin-local from elastic-internal-elasticsearch-bin-local (rw)
/mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
/mnt/elastic-internal/elasticsearch-config-local from elastic-internal-elasticsearch-config-local (rw)
/mnt/elastic-internal/elasticsearch-plugins-local from elastic-internal-elasticsearch-plugins-local (rw)
/mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
/mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
/mnt/elastic-internal/transport-certificates from elastic-internal-transport-certificates (ro)
/mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
/mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
/usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
/usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
/usr/share/elasticsearch/data from elasticsearch-data (rw)
/usr/share/elasticsearch/logs from elasticsearch-logs (rw)
Containers:
elasticsearch:
Container ID: docker://9fb879f9f0404a9997b5aa0ae915c788569c85abd008617447422ba5de559b54
Image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 06 Oct 2021 14:46:26 +0200
Last State: Terminated
Reason: Error
Exit Code: 134
Started: Wed, 06 Oct 2021 14:43:46 +0200
Finished: Wed, 06 Oct 2021 14:46:22 +0200
Ready: False
Restart Count: 1
Limits:
memory: 2Gi
Requests:
memory: 2Gi
Readiness: exec [bash -c /mnt/elastic-internal/scripts/readiness-probe-script.sh] delay=10s timeout=5s period=5s #success=1 #failure=3
Environment:
POD_IP: (v1:status.podIP)
POD_NAME: quickstart-es-default-0 (v1:metadata.name)
NODE_NAME: (v1:spec.nodeName)
NAMESPACE: default (v1:metadata.namespace)
PROBE_PASSWORD_PATH: /mnt/elastic-internal/probe-user/elastic-internal-probe
PROBE_USERNAME: elastic-internal-probe
READINESS_PROBE_PROTOCOL: https
HEADLESS_SERVICE_NAME: quickstart-es-default
NSS_SDB_USE_CACHE: no
Mounts:
/mnt/elastic-internal/downward-api from downward-api (ro)
/mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
/mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
/mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
/mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
/mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
/usr/share/elasticsearch/bin from elastic-internal-elasticsearch-bin-local (rw)
/usr/share/elasticsearch/config from elastic-internal-elasticsearch-config-local (rw)
/usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
/usr/share/elasticsearch/config/transport-certs from elastic-internal-transport-certificates (ro)
/usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
/usr/share/elasticsearch/data from elasticsearch-data (rw)
/usr/share/elasticsearch/logs from elasticsearch-logs (rw)
/usr/share/elasticsearch/plugins from elastic-internal-elasticsearch-plugins-local (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
elasticsearch-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: elasticsearch-data-quickstart-es-default-0
ReadOnly: false
downward-api:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
elastic-internal-elasticsearch-bin-local:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elastic-internal-elasticsearch-config:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-default-es-config
Optional: false
elastic-internal-elasticsearch-config-local:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elastic-internal-elasticsearch-plugins-local:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elastic-internal-http-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-http-certs-internal
Optional: false
elastic-internal-probe-user:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-internal-users
Optional: false
elastic-internal-remote-certificate-authorities:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-remote-ca
Optional: false
elastic-internal-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: quickstart-es-scripts
Optional: false
elastic-internal-transport-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-default-es-transport-certs
Optional: false
elastic-internal-unicast-hosts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: quickstart-es-unicast-hosts
Optional: false
elastic-internal-xpack-file-realm:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-xpack-file-realm
Optional: false
elasticsearch-logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 22m default-scheduler Successfully assigned default/quickstart-es-default-0 to rke-worker-1
Normal Pulled 21m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.15.0" already present on machine
Normal Created 21m kubelet Created container elastic-internal-init-filesystem
Normal Started 21m kubelet Started container elastic-internal-init-filesystem
Normal Pulled 21m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.15.0" already present on machine
Normal Created 21m kubelet Created container elasticsearch
Normal Started 21m kubelet Started container elasticsearch
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:43:57+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:02+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:07+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:12+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:17+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 20m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:22+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 20m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:27+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 20m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:32+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 20m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:37+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 115s (x223 over 20m) kubelet (combined from similar events): Readiness probe failed: {"timestamp": "2021-10-06T13:03:22+00:00", "message": "readiness probe failed", "curl_rc": "7"}
after probe restart I get following output:
{"type": "deprecation.elasticsearch", "timestamp": "2021-10-07T11:58:28,007Z", "level": "DEPRECATION", "component": "o.e.d.c.r.OperationRouting", "cluster.name": "quickstart", "node.name": "quickstart-es-default-0", "message": "searches will not be routed based on awareness attributes starting in version 8.0.0; to opt into this behaviour now please set the system property [es.search.ignore_awareness_attributes] to [true]", "key": "searches_not_routed_on_awareness_attributes" }
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fc63c3eb122, pid=7, tid=261
#
# JRE version: OpenJDK Runtime Environment Temurin-16.0.2+7 (16.0.2+7) (build 16.0.2+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-16.0.2+7 (16.0.2+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# J 711 c1 org.yaml.snakeyaml.scanner.Constant.has(I)Z (42 bytes) @ 0x00007fc63c3eb122 [0x00007fc63c3eb100+0x0000000000000022]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /usr/share/elasticsearch/core.7)
#
# An error report file with more information is saved as:
# logs/hs_err_pid7.log
Compiled method (c1) 333657 4806 3 org.yaml.snakeyaml.scanner.Constant::hasNo (15 bytes)
total in heap [0x00007fc63c50c010,0x00007fc63c50c7d0] = 1984
relocation [0x00007fc63c50c170,0x00007fc63c50c1f8] = 136
main code [0x00007fc63c50c200,0x00007fc63c50c620] = 1056
stub code [0x00007fc63c50c620,0x00007fc63c50c680] = 96
oops [0x00007fc63c50c680,0x00007fc63c50c688] = 8
metadata [0x00007fc63c50c688,0x00007fc63c50c6a8] = 32
scopes data [0x00007fc63c50c6a8,0x00007fc63c50c718] = 112
scopes pcs [0x00007fc63c50c718,0x00007fc63c50c7b8] = 160
dependencies [0x00007fc63c50c7b8,0x00007fc63c50c7c0] = 8
nul chk table [0x00007fc63c50c7c0,0x00007fc63c50c7d0] = 16
Compiled method (c1) 333676 4806 3 org.yaml.snakeyaml.scanner.Constant::hasNo (15 bytes)
total in heap [0x00007fc63c50c010,0x00007fc63c50c7d0] = 1984
relocation [0x00007fc63c50c170,0x00007fc63c50c1f8] = 136
main code [0x00007fc63c50c200,0x00007fc63c50c620] = 1056
stub code [0x00007fc63c50c620,0x00007fc63c50c680] = 96
oops [0x00007fc63c50c680,0x00007fc63c50c688] = 8
metadata [0x00007fc63c50c688,0x00007fc63c50c6a8] = 32
scopes data [0x00007fc63c50c6a8,0x00007fc63c50c718] = 112
scopes pcs [0x00007fc63c50c718,0x00007fc63c50c7b8] = 160
dependencies [0x00007fc63c50c7b8,0x00007fc63c50c7c0] = 8
nul chk table [0x00007fc63c50c7c0,0x00007fc63c50c7d0] = 16
Compiled method (c1) 333678 4812 3 org.yaml.snakeyaml.scanner.ScannerImpl::scanLineBreak (99 bytes)
total in heap [0x00007fc63c583990,0x00007fc63c584808] = 3704
relocation [0x00007fc63c583af0,0x00007fc63c583bf8] = 264
main code [0x00007fc63c583c00,0x00007fc63c584420] = 2080
stub code [0x00007fc63c584420,0x00007fc63c5844c0] = 160
oops [0x00007fc63c5844c0,0x00007fc63c5844c8] = 8
metadata [0x00007fc63c5844c8,0x00007fc63c584500] = 56
scopes data [0x00007fc63c584500,0x00007fc63c5845f0] = 240
scopes pcs [0x00007fc63c5845f0,0x00007fc63c5847b0] = 448
dependencies [0x00007fc63c5847b0,0x00007fc63c5847b8] = 8
nul chk table [0x00007fc63c5847b8,0x00007fc63c584808] = 80
Compiled method (c1) 333679 4693 2 java.lang.String::indexOf (7 bytes)
total in heap [0x00007fc63c6e0190,0x00007fc63c6e0568] = 984
relocation [0x00007fc63c6e02f0,0x00007fc63c6e0338] = 72
main code [0x00007fc63c6e0340,0x00007fc63c6e0480] = 320
stub code [0x00007fc63c6e0480,0x00007fc63c6e04d0] = 80
metadata [0x00007fc63c6e04d0,0x00007fc63c6e04e0] = 16
scopes data [0x00007fc63c6e04e0,0x00007fc63c6e0510] = 48
scopes pcs [0x00007fc63c6e0510,0x00007fc63c6e0560] = 80
dependencies [0x00007fc63c6e0560,0x00007fc63c6e0568] = 8
#
# If you would like to submit a bug report, please visit:
# https://github.com/adoptium/adoptium-support/issues
#
Upvotes: 2
Views: 9750
Reputation: 162
The solution to my problem was so easy, that I did not behave.
I narrowed down the problem to the TLS handshakes failing.
The times on the nodes were different.
I synced the times and dates on all the nodes and all problems vanished.
It was due to that difference.
The proxy was bloking the services like NTP to sync the time.
NAME READY STATUS RESTARTS AGE
quickstart-es-default-0 1/1 Running 0 3m2s
quickstart-es-default-1 1/1 Running 0 3m2s
quickstart-es-default-2 1/1 Running 0 3m2s
kubectl get elasticsearch
NAME HEALTH NODES VERSION PHASE AGE
quickstart green 3 7.15.0 Ready 3m21s
Upvotes: 2