Bob
Bob

Reputation: 1607

Envoy: connection refused (via Consul)

An Envoy Docker sidecar is deployed with each of our workload services (Nomad cluster, Docker deployments, private network of our own between our servers), and suppose to connect with a Consul service mesh.

Our service is correctly deployed and accessible on the corresponding (Nomad) network. But Consul has trouble validating the Envoy sidecar healthcheck:

Connect Sidecar Listening
ServiceName my-service-sidecar-proxy
CheckID service:_nomad-task-xxxxxx-xxxx-xxxxxx-xxxxx-xxxxxxxxxx-group-my-service-my-service-interface-sidecar-proxy:2
Type tcp
Output dial tcp 10.0.0.72:30241: connect: connection refused

From our servers, the workload service is accessible via our private network, or Nomad network, but the Envoy is not accessible.

# Workload service
server-node-0:~$ wget 10.0.0.63:28947/health
--2022-09-07 15:38:28--  http://10.0.0.63:28947/health
Connecting to 10.0.0.63:28947... connected.
HTTP request sent, awaiting response... 200 OK
# Envoy
server-node-0:~$ wget 10.0.0.63:30241
--2022-09-07 15:38:57--  http://10.0.0.63:30241/
Connecting to 10.0.0.63:30241... failed: Connection refused.

From the server where the services are deployed, or within their Docker container, a netstat indicates only the workload service port is open, not the Envoy one.

client-node-1:~$ netstat -a | grep 28947
tcp        0      0 client-node-1:49236 client-node-1:28947 TIME_WAIT
client-node-1:~$ netstat -a | grep 30241
client-node-1:~$

Yet, Envoy process seems to run properly in its sidecar container. Its admin port (19001) is open, and responses well:

root@3d7c60d8b94a:/# ps auxww
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
envoy        1  0.1  1.3 2256608 53300 ?       Ssl  15:53   0:05 envoy -c /secrets/envoy_bootstrap.json -l debug --concurrency 1 --disable-hot-restart
root@3d7c60d8b94a:/# curl -X POST 127.0.0.2:19001/ready
LIVE

What could cause Envoy to not properly listen to its port, so it can properly act as a proxy?

With debug logs on, we noticed the following messages, not knowing if it could be related:

[2022-09-07 17:03:13.173][1][debug][config] [./source/common/config/grpc_stream.h:61] Establishing new gRPC bidi stream for rpc DeltaAggregatedResources(stream .envoy.service.discovery.v3.DeltaDiscoveryRequest) returns (stream .envoy.service.discovery.v3.DeltaDiscoveryResponse);

[2022-09-07 17:03:13.173][1][debug][router] [source/common/router/router.cc:471] [C0][S3688902394289443788] cluster 'local_agent' match for URL '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
[2022-09-07 17:03:13.173][1][debug][router] [source/common/router/router.cc:675] [C0][S3688902394289443788] router decoding headers:
':method', 'POST'
':path', '/envoy.service.discovery.v3.AggregatedDiscoveryService/DeltaAggregatedResources'
':authority', 'local_agent'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-consul-token', 'bf88ad2f-7d22-716c-6e5f-8c404f866d6e'
'x-envoy-internal', 'true'
'x-forwarded-for', '172.26.64.4'

[2022-09-07 17:03:13.173][1][debug][pool] [source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
[2022-09-07 17:03:13.173][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:290] trying to create new connection
[2022-09-07 17:03:13.173][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
[2022-09-07 17:03:13.173][1][debug][http2] [source/common/http/http2/codec_impl.cc:1788] [C294] updating connection-level initial window size to 268435456
[2022-09-07 17:03:13.173][1][debug][connection] [./source/common/network/connection_impl.h:89] [C294] current connecting state: true
[2022-09-07 17:03:13.173][1][debug][client] [source/common/http/codec_client.cc:57] [C294] connecting
[2022-09-07 17:03:13.173][1][debug][connection] [source/common/network/connection_impl.cc:924] [C294] connecting to alloc/tmp/consul_grpc.sock
[2022-09-07 17:03:13.174][1][debug][connection] [source/common/network/connection_impl.cc:683] [C294] connected
[2022-09-07 17:03:13.174][1][debug][client] [source/common/http/codec_client.cc:89] [C294] connected
[2022-09-07 17:03:13.174][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:327] [C294] attaching to next stream
[2022-09-07 17:03:13.174][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:181] [C294] creating stream
[2022-09-07 17:03:13.174][1][debug][router] [source/common/router/upstream_request.cc:424] [C0][S3688902394289443788] pool ready
[2022-09-07 17:03:13.179][1][debug][connection] [source/common/network/connection_impl.cc:651] [C294] remote close
[2022-09-07 17:03:13.179][1][debug][connection] [source/common/network/connection_impl.cc:250] [C294] closing socket: 0
[2022-09-07 17:03:13.179][1][debug][client] [source/common/http/codec_client.cc:108] [C294] disconnect. resetting 1 pending requests
[2022-09-07 17:03:13.179][1][debug][client] [source/common/http/codec_client.cc:140] [C294] request reset
[2022-09-07 17:03:13.179][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:214] [C294] destroying stream: 0 remaining
[2022-09-07 17:03:13.179][1][debug][router] [source/common/router/router.cc:1198] [C0][S3688902394289443788] upstream reset: reset reason: connection termination, transport failure reason:
[2022-09-07 17:03:13.179][1][debug][http] [source/common/http/async_client_impl.cc:101] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: connection termination'

[2022-09-07 17:03:13.179][1][warning][config] [./source/common/config/grpc_stream.h:196] DeltaAggregatedResources gRPC config stream closed since 4161s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
[2022-09-07 17:03:13.179][1][debug][config] [source/common/config/grpc_subscription_impl.cc:113] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2022-09-07 17:03:13.179][1][debug][config] [source/common/config/grpc_subscription_impl.cc:113] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2022-09-07 17:03:13.179][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:483] [C294] client disconnected, failure reason:
[2022-09-07 17:03:13.179][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:453] invoking idle callbacks - is_draining_for_deletion_=false

Envoy version: 1.23.0 Consul version: 1.13.1

Upvotes: 1

Views: 1412

Answers (1)

Bob
Bob

Reputation: 1607

After discovering this message within Consul logs:

consul grpc: Server.Serve failed to complete security handshake from "127.0.0.1:50356": tls: first record does not look like a TLS handshake

the root cause has been identified, as Consul 1.13.1 ships in with an incompatibility against Envoy, if using certain parameters (namely auto-encrypt or auto-config).

The solution is then:

  • to rollback to Consul 1.12
  • to wait for Consul 1.13.2 which will embark a fix to this problem (even yet, some parameters will have to be applied, see this comment)

We rollbacked to Consul 1.12 (and now Envoy fails at communicating with its attached service, but this is surely another problem).

Upvotes: 1

Related Questions