cyberwombat
cyberwombat

Reputation: 40133

Elasticsearch sum based on condition

I have this mock data which I want grouped by name then have a total field that is the sum of the value fields only if the status is won.

[{
  name: 'Foo',
  value: 12,
  status: 'won'
},
{
  name: 'Foo',
  value: 2,
  status: 'lost'
},
{
  name: 'Foo',
  value: 10,
  status: 'won'
},
{
  name: 'Bar',
  value: 4,
  status: 'won'
}]

I am able to group by name and obtain the total sum of the value fields but have not figured out how to only sum the won cases.

aggs: {
  by_name: {
    terms: {
      field: 'name'
    },
    aggs: {
      total_value: {
        sum: {
          field: 'value' // What I want is value if status == 'won' 
        }
      }
    }
  }

My desired result should look like:

[{
  name: 'Foo',
  total_value: 22 // Currently 24
}, {
  name: 'Bar',
  total_value: 4
}]

This seems like a common used case but while I have found lots of info on filtering but not this particular case.

Upvotes: 2

Views: 5537

Answers (2)

取一个好的名字
取一个好的名字

Reputation: 1823

"total_value": {
      "sum": {
        "script":"params.status =='won'?params.value:0"
      }
    }

Upvotes: 0

cyberwombat
cyberwombat

Reputation: 40133

OK I found two ways to do this.

1. Using a script

ES supports various scripting languages but has built in support for Painless:

aggs: {
  by_name: {
    terms: {
      field: 'name'
    },
    aggs: {
      total_value: {
        sum: {
          script: {
           lang: 'painless',
           source:doc['status'].value == 'won' ? doc['value'] : 0"
          }
        }
      }
    }
  }

2. Using nested grouping/aggregation

In my use case I also need to total up all the won and lost as individual field to get a result set more like:

[{
  name: 'Foo',
  total_won_value: 22,
  total_won: 2
  total_lost_value: 2,
  total_lost: 1
}, {
  ...
}

While this can be done with a few scripts I suspect (This would have to be tested though) its more performant to achieve this with nested aggregation.

aggs: {
  by_name: {
    terms: {
      field: 'name'
    },
    aggs: {
      by_status: {
        terms: {
          field: 'status'
        },
        aggs: {
          total_value_by_status: {
            sum: {
              field: 'value'
            }
          }
        }
      }
    }
  }
}    

The drawback of the second method is that its a bit harder to parse the results especially in something like AppSync templates.

Upvotes: 6

Related Questions