Reputation: 503
I have the following structure GET /index-*/_mapping
:
"top_field" : {
"properties" : {
"dict_key1" : {
"properties" : {
"field1" : {...},
"field2" : {...},
"field3" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"field4" : {...}
},
"dict_key2" : {
"properties" : {
"field1" : {...},
"field2" : {...},
"field3" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"field4" : {...}
},
"dict_key3": ...
}
In other words, top_field
stores a json.
I would like to aggregate over 'field3.keyword'
regardless of dict_key*
. Something like top_field.*.field3.keyword
.
However, I can't get it to work using terms aggregation, with or without nested. I also tried to just to bucket by the different dict_key*
, which would be almost as good, but I can't get this to work either.
How can I do this?
Upvotes: 0
Views: 433
Reputation: 16923
TL;DR I had the same problem some time ago (Terms aggregation with nested wildcard path) and it turns out it's not directly possible due to the way lookups and path accessors are performed.
There's a scripted workaround though:
{
"size": 0,
"aggs": {
"terms_emulator": {
"scripted_metric": {
"init_script": "state.keyword_counts = [:]",
"map_script": """
def source = params._source['top_field'];
for (def key : source.keySet()) {
if (!source[key].containsKey('field3')) continue;
def field3_kw = source[key]['field3'];
if (state.keyword_counts.containsKey(field3_kw)) {
state.keyword_counts[field3_kw] += 1;
} else {
state.keyword_counts[field3_kw] = 1;
}
}
""",
"combine_script": "state",
"reduce_script": "states[0]"
}
}
}
}
yielding something along the lines of
"aggregations" : {
"terms_emulator" : {
"value" : {
"keyword_counts" : {
"world" : 1,
"hello" : 2
}
}
}
}
While this works just fine, I'd disadvise using scripts in production. You could rather restructure your data such that straightforward lookups are possible. For instance:
{
"top_field": {
"entries": [
{
"group_name": "dict_key1",
"key_value_pairs": {
"field3": "hello"
}
},
{
"group_name": "dict_key2",
"key_value_pairs": {
"field3": "world"
}
}
]
}
}
and make entries
nested. Maybe even ditch top_field
since it seems redundant and start directly w/ entries
.
Upvotes: 1