Elasticsearch multiple sum aggregations

Question

We have a lot of documents in each index (~10 000 000). But each document is very small and contains almost only integer values.

We needed to SUM all numerical field.

First step - We ask for all available fields with a mapping.

Example :

GET INDEX/TYPE/_mapping

Second step - We build the request with the fields from the mapping.

Example :

GET INDEX/TYPE/_search
{
    // SOME FILTERS TO REDUCE THE NUMBER OF DOCUMENTS
    "size":0,
    "aggs":{  
        "FIELD 1":{  
            "sum":{  
                "field":"FIELD 1"
            }
        },
        "FIELD 2":{  
            "sum":{  
                "field":"FIELD 2"
            }
        },
        // ...
        "FIELD N":{  
            "sum":{  
                "field":"FIELD N"
            }
        }
    }
}

Our problem is that the second request execution time is linear with the number of field N.

That's not acceptable as this is only sums. So we tried to generate our own aggregation with a scripted metric (groovy).

Exemple with only 2 fields :

// ...
"aggs": {
    "test": {
        "scripted_metric": {
            "init_script": "_agg['t'] = []",
            "map_script": "_agg.t.add(doc)",
            "combine_script": "res = [:]; res['FIELD 1'] = 0; res['FIELD 2'] = 0; for (t in _agg.t) { res['FIELD 1'] += t.['FIELD 1']; res['FIELD 2'] += t.['FIELD 2']; }; return res", 
            "reduce_script": "res = [:]; res['FIELD 1'] = 0; res['FIELD 2'] = 0; for (t in _aggs) { res['FIELD 1'] += t.['FIELD 1']; res['FIELD 2'] += t.['FIELD 2']; }; return res"
        }
    }
}
// ...

But it appears that the more affectations we add in the script, the more time it takes to execute it, so it doesn't solve our problem.

There is not a lot of example out there.

Do you have some ideas to improve this script performances ? Or other ideas ?

Elasticsearch multiple sum aggregations

Answers (1)

Related Questions