user2091046
user2091046

Reputation: 625

Python elasticsearch DSL aggregation/metric of nested values per document

I'm trying to find the minimum (smallest) value in a 2-level nesting (separate minimum value per document).

So far I'm able to make an aggregation which counts the min value from all the nested values in my search results but without separation per document.

My example schema:

class MyExample(DocType):
    myexample_id = Integer()
    nested1 = Nested(
        properties={
            'timestamp': Date(),
            'foo': Nested(
                properties={
                    'bar': Float(),
                }
            )
        }
    )
    nested2 = Nested(
        multi=False,
        properties={
            'x': String(),
            'y': String(),
        }
    )

And this is how I'm searching and aggregating:

from elasticsearch_dsl import Search, Q

search = Search().filter(
    'nested', path='nested1', inner_hits={},
    query=Q(
        'range', **{
            'nested1.timestamp': {
                'gte': exampleDate1,
                'lte': exampleDate2
            }
        }
    )
).filter(
    'nested', path='nested2', inner_hits={'name': 'x'},
    query=Q(
        'term', **{
            'nested2.x': x
        }
    )
).filter(
    'nested', path='nested2', inner_hits={'name': 'y'},
    query=Q(
        'term', **{
            'nested2.y': y
        }
    )
)

search.aggs.bucket(
    'nested1', 'nested', path='nested1'
).bucket(
    'nested_foo', 'nested', path='nested1.foo'
).metric(
    'min_bar', 'min', field='nested1.foo.bar'
)

Basically what I need to do is to get the min value for all the nested nested1.foo.bar values for each unique MyExample (they have unique myexample_id field)

Upvotes: 3

Views: 3536

Answers (1)

Honza Král
Honza Král

Reputation: 3022

If you want minimum value per document then put all the nested buckets within a bucket terms aggregation over myexample_id field:

search.aggs..bucket(
  'docs', 'terms', field='myexample_id'
).bucket(
  'nested1', 'nested', path='nested1'
).bucket(
  'nested_foo', 'nested', path='nested1.foo'
).metric(
  'min_bar', 'min', field='nested1.foo.bar'
)

Note that this aggregation might be extremely expensive to calculate since it has to create a bucket for each document. For a use case like this it might be easier to compute the minimum on a per document basis as a script_field or in the app.

Upvotes: 2

Related Questions