Reputation: 177
I'm running a Logstash instance which is connected to an ES cluster behind a load balancer. The load balancer has an idle timeout of 5 minutes. Logstash is configured with the ES url corresponding to the loadbalancer ip.
Normally everything works fine, but what happens is that after a period of requests inactivity, the next request processed by LS goes in error with the following:
[2018-10-30T08:15:00,757][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://10.100.24.254:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://10.100.24.254:9200/, :error_message=>"Elasticsearch Unreachable: [http://10.100.24.254:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-10-30T08:15:00,759][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://10.100.24.254:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-10-30T08:15:02,760][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2018-10-30T08:15:02,760][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2018-10-30T08:15:05,651][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://10.100.24.254:9200/, :path=>"/"}
LS eventually recovers, but it takes more than 1 min and this is not acceptable for our SLA.
I suspect that's due to the loadbalancer closing the connections after 5 min of inactivity.
I've tried setting:
timeout => 3
which makes things better. The request is retried after 3 secs, but this is still not good enough. What's the best set of configuration options that I can use to make sure the connections are always healthy and working before the requests are attempted and so I experience no delay at all?
Upvotes: 1
Views: 11749
Reputation: 3018
Try validate_after_inactivity
setting as described here
Or you can try enabling keep alive on your logstash server so logstash knows the connection has been severed when LB hits idle time out and it starts a new connection instead of sending requests on the old stale connection.
Upvotes: 1