Reputation: 4265
Shortly: with Elasticsearch, given a list of fields, how can I get the average number of missing fields per document as an aggregation?
With the missing
aggregation type I can get the total number of documents where a given field is missing. So with the following data:
"hits": [{
"name": "A name",
"nickname": "A nickname",
"bestfriend": "A friend",
"hobby": "An hobby"
},{
"name": "A name",
"hobby": "An hobby"
},{
"name": "A name",
"nickname": "A nickname",
"hobby": "An hobby"
},{
"name": "A name",
"bestfriend": "A friend"
}]
I can run the following query:
{
"aggs": {
"name_missing": {
"missing": {"field": "name"}
},
"nickname_missing": {
"missing": {"field": "nickname"}
},
"hobby_missing": {
"missing": {"field": "hobby"}
},
"bestfriend_missing": {
"missing": {"field": "bestfriend"}
}
}
}
And I get the following aggregations:
...
"aggregations": {
"name_missing": {
"doc_count": 0
},
"nickname_missing": {
"doc_count": 2
},
"hobby_missing": {
"doc_count": 1
},
"bestfriend_missing": {
"doc_count": 1
}
}
...
What I need now is to get the average number of missing fields for each document. I can just do the math by code on the results:
missing
aggregations doc_count
valueBut how would you get the same result as an aggregation from Elasticsearch?
Thank you for any reply / suggestion.
Upvotes: 1
Views: 549
Reputation: 1251
This is an ugly solution but it does the trick.
GET missing/missing/_search
{
"size": 0,
"aggs": {
"result": {
"terms": {
"script": "'aaa'"
},
"aggs": {
"name_missing": {
"missing": {
"field": "name"
}
},
"nickname_missing": {
"missing": {
"field": "nickname"
}
},
"hobby_missing": {
"missing": {
"field": "hobby"
}
},
"bestfriend_missing": {
"missing": {
"field": "bestfriend"
}
},
"avg_missing": {
"bucket_script": {
"buckets_path": { // This is kind of defining variables. name_missing._count will take the doc_count of the name_missing aggregation and same for others(nickname_missing,hobby_missing,bestfriend_missing) as well. "count":"_count" will take doc_count of the documents on which aggregation is performed(total no. of Hits).
"name_missing": "name_missing._count",
"nickname_missing": "nickname_missing._count",
"hobby_missing": "hobby_missing._count",
"bestfriend_missing": "bestfriend_missing._count",
"count":"_count"
},
"script": "(name_missing+nickname_missing+hobby_missing+bestfriend_missing)/count" // Here we are adding all the missing values and dividing it by the total no. of Hits as you require.
}
}
}
}
}
}
I've shown you how to do it, now its on you how you want to massage your parameters and extract what you intend to.
Upvotes: 1