Reputation: 1607
An Envoy Docker sidecar is deployed with each of our workload services (Nomad cluster, Docker deployments, private network of our own between our servers), and suppose to connect with a Consul service mesh.
Our service is correctly deployed and accessible on the corresponding (Nomad) network. But Consul has trouble validating the Envoy sidecar healthcheck:
Connect Sidecar Listening
ServiceName my-service-sidecar-proxy
CheckID service:_nomad-task-xxxxxx-xxxx-xxxxxx-xxxxx-xxxxxxxxxx-group-my-service-my-service-interface-sidecar-proxy:2
Type tcp
Output dial tcp 10.0.0.72:30241: connect: connection refused
From our servers, the workload service is accessible via our private network, or Nomad network, but the Envoy is not accessible.
# Workload service
server-node-0:~$ wget 10.0.0.63:28947/health
--2022-09-07 15:38:28-- http://10.0.0.63:28947/health
Connecting to 10.0.0.63:28947... connected.
HTTP request sent, awaiting response... 200 OK
# Envoy
server-node-0:~$ wget 10.0.0.63:30241
--2022-09-07 15:38:57-- http://10.0.0.63:30241/
Connecting to 10.0.0.63:30241... failed: Connection refused.
From the server where the services are deployed, or within their Docker container, a netstat indicates only the workload service port is open, not the Envoy one.
client-node-1:~$ netstat -a | grep 28947
tcp 0 0 client-node-1:49236 client-node-1:28947 TIME_WAIT
client-node-1:~$ netstat -a | grep 30241
client-node-1:~$
Yet, Envoy process seems to run properly in its sidecar container. Its admin port (19001) is open, and responses well:
root@3d7c60d8b94a:/# ps auxww
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
envoy 1 0.1 1.3 2256608 53300 ? Ssl 15:53 0:05 envoy -c /secrets/envoy_bootstrap.json -l debug --concurrency 1 --disable-hot-restart
root@3d7c60d8b94a:/# curl -X POST 127.0.0.2:19001/ready
LIVE
What could cause Envoy to not properly listen to its port, so it can properly act as a proxy?
With debug logs on, we noticed the following messages, not knowing if it could be related:
[2022-09-07 17:03:13.173][1][debug][config] [./source/common/config/grpc_stream.h:61] Establishing new gRPC bidi stream for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);
[2022-09-07 17:03:13.173][1][debug][router] [source/common/router/router.cc:471] [C0][S3688902394289443788] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2022-09-07 17:03:13.173][1][debug][router] [source/common/router/router.cc:675] [C0][S3688902394289443788] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-consul-token', 'bf88ad2f-7d22-716c-6e5f-8c404f866d6e'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.4'
[2022-09-07 17:03:13.173][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2022-09-07 17:03:13.173][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:290] trying to create new connection
[2022-09-07 17:03:13.173][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2022-09-07 17:03:13.173][1][debug][http2] [source/common/http/http2/codec_impl.cc:1788] [C294] updating connection-level initial window size to 268435456
[2022-09-07 17:03:13.173][1][debug][connection] [./source/common/network/connection_impl.h:89] [C294] current connecting state: true
[2022-09-07 17:03:13.173][1][debug][client] [source/common/http/codec_client.cc:57] [C294] connecting
[2022-09-07 17:03:13.173][1][debug][connection] [source/common/network/connection_impl.cc:924] [C294] connecting to alloc/tmp/consul_grpc.sock
[2022-09-07 17:03:13.174][1][debug][connection] [source/common/network/connection_impl.cc:683] [C294] connected
[2022-09-07 17:03:13.174][1][debug][client] [source/common/http/codec_client.cc:89] [C294] connected
[2022-09-07 17:03:13.174][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:327] [C294] attaching to next stream
[2022-09-07 17:03:13.174][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:181] [C294] creating stream
[2022-09-07 17:03:13.174][1][debug][router] [source/common/router/upstream_request.cc:424] [C0][S3688902394289443788] pool ready
[2022-09-07 17:03:13.179][1][debug][connection] [source/common/network/connection_impl.cc:651] [C294] remote close
[2022-09-07 17:03:13.179][1][debug][connection] [source/common/network/connection_impl.cc:250] [C294] closing socket: 0
[2022-09-07 17:03:13.179][1][debug][client] [source/common/http/codec_client.cc:108] [C294] disconnect. resetting 1 pending requests
[2022-09-07 17:03:13.179][1][debug][client] [source/common/http/codec_client.cc:140] [C294] request reset
[2022-09-07 17:03:13.179][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:214] [C294] destroying stream: 0 remaining
[2022-09-07 17:03:13.179][1][debug][router] [source/common/router/router.cc:1198] [C0][S3688902394289443788] upstream reset: reset reason: connection termination, transport failure reason:
[2022-09-07 17:03:13.179][1][debug][http] [source/common/http/async_client_impl.cc:101] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: connection termination'
[2022-09-07 17:03:13.179][1][warning][config] [./source/common/config/grpc_stream.h:196] DeltaAggregatedResources gRPC config stream closed since 4161s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
[2022-09-07 17:03:13.179][1][debug][config] [source/common/config/grpc_subscription_impl.cc:113] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2022-09-07 17:03:13.179][1][debug][config] [source/common/config/grpc_subscription_impl.cc:113] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2022-09-07 17:03:13.179][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:483] [C294] client disconnected, failure reason:
[2022-09-07 17:03:13.179][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:453] invoking idle callbacks - is_draining_for_deletion_=false
Envoy version: 1.23.0 Consul version: 1.13.1
Upvotes: 1
Views: 1412
Reputation: 1607
After discovering this message within Consul logs:
consul grpc: Server.Serve failed to complete security handshake from "127.0.0.1:50356": tls: first record does not look like a TLS handshake
the root cause has been identified, as Consul 1.13.1 ships in with an incompatibility against Envoy, if using certain parameters (namely auto-encrypt
or auto-config
).
The solution is then:
We rollbacked to Consul 1.12 (and now Envoy fails at communicating with its attached service, but this is surely another problem).
Upvotes: 1