evdh
evdh

Reputation: 239

Elasticsearch client on AWS / Lambda / Java - 2.5 seconds client startup time

We're using AWS Lambda (Java) and the elasticsearch client to connect to a hosted elasticsearch instance on AWS. I encounter a long wait on the first request of about 2.5 seconds (on top of a cold start). After that it is very quick. I can't really figure out where this delay is coming from and I'm trying to optimize it.

private void testPerformanceElasticSearch() throws Exception {
        log.info("1. Before testing elasticsearch client");
        AWS4Signer signer = new AWS4Signer();
        signer.setServiceName("es");
        signer.setRegionName("eu-west-1");
        HttpRequestInterceptor interceptor = new AWSRequestSigningApacheInterceptor("es", signer, new DefaultAWSCredentialsProviderChain());
        String endpoint = "https://" + Utils.getEnvironmentVariable("ELASTIC_SEARCH_ENDPOINT");
        RestHighLevelClient restHighLevelClient = new RestHighLevelClient(RestClient.builder(HttpHost.create(endpoint)).setHttpClientConfigCallback(hacb -> hacb.addInterceptorLast(interceptor)));
        log.info("2. After getting elasticsearch client");
        log.info("3. Before doing a elasticsearch query");
        log.info("4");
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        log.info("5");
        TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("userId", "abc");
        log.info("6");
        boolQueryBuilder.must(termsQueryBuilder);
        log.info("7");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        log.info("8");
        searchSourceBuilder.query(boolQueryBuilder);
        log.info("9");
        SearchRequest searchRequest = new SearchRequest("users");
        log.info("10");
        searchRequest.source(searchSourceBuilder);
        log.info("11");
        restHighLevelClient.search(searchRequest);
        log.info("12");
        log.info("13. After testing elasticsearch");
}

And then I get logging like this; you can see between '5' and '6' there is more than a 2 second delay which I can't really place:

17:16:06.871INFO[PlacesPerformance] 1. Before testing elasticsearch client
17:16:06.932INFO[PlacesPerformance] 2. After getting elasticsearch client
17:16:06.933INFO[PlacesPerformance] 3. Before doing a elasticsearch query
17:16:06.935INFO[PlacesPerformance] 4
17:16:06.942INFO[PlacesPerformance] 5
17:16:09.179INFO[PlacesPerformance] 6
17:16:09.179INFO[PlacesPerformance] 7
17:16:09.181INFO[PlacesPerformance] 8
17:16:09.181INFO[PlacesPerformance] 9
17:16:09.183INFO[PlacesPerformance] 10
17:16:09.183INFO[PlacesPerformance] 11
17:16:09.362INFO[PlacesPerformance] 12
17:16:09.362INFO[PlacesPerformance] 13. After testing elasticsearch

Any suggestions on how to improve this?

UPDATE:

Strange. Whenever I run the code in a lambda, I experience the 2.5 second delay when constructing the request (not even executing it). Locally, it works fine though. I tried the following:

1.  Local against local elasticsearch. No delay.
2.  Local against AWS elasticsearch. No delay.
3.  Lambda with signing request. DELAY.
4.  Lambda without signing request. DELAY.
5.  Lambda with a 'match all' query. DELAY
6.  Lambda with a http address. DELAY.
7.  Lambda with a custom runtime. DELAY.
8.  Lambda with a custom runtime. DELAY.
9.  Lambda with standard Java 8 runtime. DELAY.

Upvotes: 7

Views: 1831

Answers (2)

serkan_ozal
serkan_ozal

Reputation: 194

The problem is that at the first request (real request, not warmup request as warmup requests don't go through your application code, it doesn't trigger loading classes which are used in actual request path) JVM loads (read, parse, verify, etc ...) related classes, initializes security components (ciphers, etc ...) and TLS handshake is done (requires multiple RTT, with Java 9 and TLS 1.3 this should be reduced).

The similar long duration behaviour is also seen for first AWS service calls (DynamoDB, SQS, etc ...)

As I am the author Thundra warmup plugin, I am thinking of introducing hook points for warmup messages as custom action will be able to executed like initializing security components, loading classes, etc ...

Upvotes: 6

Thales Minussi
Thales Minussi

Reputation: 7235

Lambda functions inside VPCs have a great impact on the startup time. You said your ES is a hosted instance, so I assume it's backed by a VPC.

Even if it's not in a VPC, Java cold starts are usually, by nature, longer than runtimes like Node or Python, because the JVM needs to be started up first. This is mainly where your 2.5 seconds come from.

OK. How to fix the issue?

It depends on how many concurrent connections you need to ElasticSearch. If one function is able to handle all the incoming requests, you can then limit the concurrent execution of your Lambda function to 1, so you make sure you are always hitting the same container (as long as these requests are made in a ±5 min time frame).

Now, if you don't know upfront how many concurrent Lambda functions will execute, you kind of have no way out. You could try warming up your Lambda functions beforehand, but then you'd need to fire like 100 requests at the same time to warm up 100 different containers.

Please check this answer as I go through the concurrent model of Lambda functions and how the cold/warm starts work.

I am happy to edit my answer if you have more info to share or if I wasn't clear enough.

Upvotes: 0

Related Questions