Mohammad Akbari
Mohammad Akbari

Reputation: 4766

Sum of max per group in Elasticsearch

I want to search sum of max per group in Elasticsearch data. For example:

Data is:

id  | gId | cost
----|-----|------
1   |  1  | 20 
2   |  1  | 15
3   |  2  | 30 
4   |  1  | 30   *
5   |  2  | 40   *
6   |  1  | 20
7   |  2  | 30
8   |  3  | 45   *
9   |  1  | 10

I use sum_bucket for doing summation on max per group. This is my query:

{
    "aggs": {
        "T1":{
            "terms": {
                "field": "gId",
                "size":3
            },
            "aggs":{
                "MAX_COST":{
                    "max": {
                        "field": "cost"
                    }
                }
            }
        },
        "T2":{
            "sum_bucket": {
                "buckets_path": "T1>MAX_COST"
            }
        }
    },
    "size": 0
}

Query response is

"T1": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [                     |
        {                            |
            "key": 1,                |
            "doc_count": 5,          |
            "MAX": {                 |
                "value": 30          |
            }                        |
        },                           |
        {                            | How can ignore this part to return
            "key": 2,                | from elasticsearch query response
            "doc_count": 3,          |
            "MAX": {                 |
                "value": 40          |
            }                        |
        },                           |
        {                            |
            "key": 3,                |
            "doc_count": 1,          |
            "MAX": {                 |
                "value": 45          |
            }                        |
        }                            |
    ]                                   
},
"T2": {
    "value": 115
}

T2.value is the desired result. But I want in the query result T1.buckets ignore for network performance issues because my data is very large. By setting T1.terms.size to a specific number only top number of result effect in T2.value result. How can ignore T1.buckets in result by the query that I write or sum better query for Sum of max per group problem?

Upvotes: 1

Views: 1643

Answers (2)

Russ Cam
Russ Cam

Reputation: 125488

You can use filter_path to return only a subset of the response

var searchResponse = client.Search<Document>(s => s
    .FilterPath(new[] { "T2.value" }) // paths to include in response
    .Aggregations(a => a
        // ... rest of aggs here
    )
);

Bear in mind that using filter_path with NEST can sometimes result in a response that cannot be deserialized by the internal serializer, because the structure is unexpected. In these kinds of cases, you can use the low level client exposed on the high level client to handle the response

var searchDescriptor = new SearchDescriptor<Document>()
    .Aggregations(a => a
        // ... rest of aggs here
    );

var searchResponse = client.LowLevel.Search<StringResponse>(
    "index", 
    "type",
    PostData.Serializable(searchDescriptor),
    new SearchRequestParameters
    {
        QueryString = new Dictionary<string, object>
        {
            ["filter_path"] = "T2.value"
        }       
    });

// do something with JSON string response
var json = searchResponse.Body;

Upvotes: 0

Tom Slabbaert
Tom Slabbaert

Reputation: 22276

I'm not familiar with an "official" way of doing this to fields added while aggregating however you can "hack" this using the Post filter option.

add a term that could tell apart T1 and T2 like {"exists": {"field": "value"}} assuming you can use "exists" as a term.

Upvotes: 1

Related Questions