Reputation: 6963
I'm trying to use an Elasticsearch aggregation to return all non-unique counts for each term within a bucket.
Given a mapping:-
{
"properties": {
"addresses": {
"properties": {
"meta": {
"properties": {
"types": {
"properties": {
"type": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
And a document:-
{
"id": 3,
"first_name": "James",
"last_name": "Smith",
"addresses": [
{
"meta": {
"types": [
{
"type": "Home"
},
{
"type": "Home"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Fax"
}
]
}
}
]
}
The following terms
aggregation:-
GET /test/_search
{
"size": 0,
"query": {
"match": {
"id": 3
}
},
"aggs": {
"types": {
"terms": {
"field": "addresses.meta.types.type"
}
}
}
}
Gives this result:-
"aggregations" : {
"types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Business",
"doc_count" : 1
},
{
"key" : "Fax",
"doc_count" : 1
},
{
"key" : "Home",
"doc_count" : 1
}
]
}
}
As you can see the terms are unique and I'm really after a total count of each e.g. Home: 2, Business: 3 and Fax: 1.
Is this possible?
I had a look at value_count
but as it's not a bucket aggregation it seems a little less convenient to use. Alternatively possible a script might do it but I'm not too sure on the syntax.
Thanks!
Upvotes: 2
Views: 522
Reputation: 8840
I doubt if that is possible using object type in Elasticsearch. The reason is that most of the metrics aggregations is w.r.t the count of documents for particular occurrence of word and not counts of occurrence of words in documents.
You may have to change the type of your field type
to nested
so that ES would end up saving each type
inside types
as separate document.
I've provided sample mapping, document(no change in representation), aggregation query and response below.
PUT nested_test
{
"mappings":{
"properties":{
"id":{
"type":"integer"
},
"first_name":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"second_name":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
},
"addresses":{
"properties":{
"meta":{
"properties":{
"types":{
"type":"nested", <----- Note this
"properties":{
"type":{
"type":"keyword"
}
}
}
}
}
}
}
}
}
}
POST nested_test/_doc/1
{
"id": 3,
"first_name": "James",
"last_name": "Smith",
"addresses": [
{
"meta": {
"types": [
{
"type": "Home"
},
{
"type": "Home"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Business"
},
{
"type": "Fax"
}
]
}
}
]
}
Note that every type above is now considered as a separate document linked to the main document.
All that would be required is to make use of Nested Aggregation + Terms Aggregation
POST nested_test/_search
{
"size": 0,
"aggs": {
"myterms": {
"nested": {
"path": "addresses.meta.types"
},
"aggs": {
"myterms": {
"terms": {
"field": "addresses.meta.types.type",
"size": 10,
"min_doc_count": 2 <----- Note this to filter only values with non unique counts
}
}
}
}
}
}
Note that in the above query I've made use of min_doc_count
in order to restrict the results as per what you are looking for.
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"myterms" : {
"doc_count" : 6,
"myterms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Business",
"doc_count" : 3
},
{
"key" : "Home",
"doc_count" : 2
}
]
}
}
}
}
Hope that helps!
Upvotes: 3