Reputation: 511
I have an elastsicsearch instance on AWS (I have a similar one that is working just fine.
I have a lambda function that ships logs to elasticsearch. It stopped working after a while and now I can't see any new logs. I looked into logs and found this :
org.elasticsearch.transport.RemoteTransportException: [tqStC42][10.0.1.90:9300][indices:data/read/search[phase/query]]
Caused by: java.lang.IllegalArgumentException: size must be positive, got 0
at org.elasticsearch.search.aggregations.bucket.BucketUtils.suggestShardSideQueueSize(BucketUtils.java:40) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:100) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:55) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:225) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:102) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:61) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:104) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$18(IndicesService.java:1159) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$20(IndicesService.java:1229) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:150) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:133) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:398) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1235) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1158) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:257) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:273) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:300) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:297) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:577) [elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:527) [elasticsearch-5.1.1.jar:5.1.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.1.1.jar:5.1.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
I did a curl on cluster health (yellow because one node)
{
"cluster_name":"Pixel",
"status":"yellow",
"timed_out":false,
"number_of_nodes":1,
"number_of_data_nodes":1,
"active_primary_shards":11,
"active_shards":11,
"relocating_shards":0,
"initializing_shards":0,
"unassigned_shards":11,
"delayed_unassigned_shards":0,
"number_of_pending_tasks":0,
"number_of_in_flight_fetch":0,
"task_max_waiting_in_queue_millis":0,
"active_shards_percent_as_number":50.0
}
A bit more information on the instance: it's on m4.4xlarge with 100 GB of EBS. It also uses 32 GB mem. So far has 600k records in one index.
What else should I look for?
I also cat the shards
pixel 1 p STARTED 130527 74.7mb 10.0.1.90 tqStC42
pixel 1 r UNASSIGNED
pixel 3 p STARTED 129687 74.4mb 10.0.1.90 tqStC42
pixel 3 r UNASSIGNED
pixel 2 p STARTED 130561 74mb 10.0.1.90 tqStC42
pixel 2 r UNASSIGNED
pixel 4 p STARTED 129870 74.6mb 10.0.1.90 tqStC42
pixel 4 r UNASSIGNED
pixel 0 p STARTED 129981 74.4mb 10.0.1.90 tqStC42
pixel 0 r UNASSIGNED
Upvotes: 2
Views: 15439
Reputation: 217304
The error you're getting is because you have an aggregation query, probably a terms
, significant_terms
or geohash_grid
one, in which you have specified "size": 0
in order to get all the possible terms.
This was possible in ES 2.x and earlier releases, but not anymore since ES 5.x because that can greatly harm performance. Now if you want to get all the terms of a given field, you have to explicitly set a high number like "size": 1000
or whatever makes sense to you, it just needs to be a number bigger than the cardinality of your field.
Upvotes: 3