Reputation: 5737
Document count: 4 Billion
disc size : 2 TB
Primary: 5
replica: 2
master node : 3
data node: 4 * [16cpu and 64GB ram]
heap size: 30GB
mlock enable : true
It takes up to 3 minutes to respond to aggregation queries. On subsequent request, it caches and speeds things up. Is there a way to speed the aggregation on the first query?
Example aggregation query:
{
"query": {
"bool": {
"must": [],
"must_not": [],
"should": []
}
},
"size": 0,
"aggs": {
"agg_;COUNT_ROWS;5d8b0621690e727ff775d4ed": {
"terms": {
"field": "feild1.keyword",
"size": 10000,
"shard_size": 100,
"order": {
"_term": "asc"
}
},
"aggs": {
"agg_;COUNT_ROWS;5d8b0621690e727ff775d4ec": {
"terms": {
"field": "feild2.keyword",
"size": 30,
"shard_size": 100,
"order": {
"_term": "asc"
}
},
"aggs": {
"agg_HouseHold;COUNT_DISTINCT": {
"cardinality": {
"field": "feild3.keyword",
"precision_threshold": 40000
}
}
}
}
}
}
}
}
Upvotes: 0
Views: 810
Reputation: 1855
If I understand right, you are running the query on a single instance, with a total of 15 shards, 5 of which are primaries. The first terms
aggregation have a size of 10,000. that is a high number that effects performance. consider moving to composite-aggregation in order to use pagination and not to try to squeeze it to a huge response.
Also, the shard_size
doesn't make much sense to me, as you only query 5 shards, and asking for 10,000 results - bringing 100 results from 5 shards would yield 500 results, which is not enough. I would drop this shard_size
param, or set a higher value in order for it to make sense.
Upvotes: 1