Reputation: 413
The situation:
I am a starter in Elasticsearch and cannot wrap my head around how to use the aggregations go get what I need.
I have documents with the following structure:
{
...
"authors" : [
{
"name" : "Bob",
"@type" : "Person"
}
],
"resort": "Politics",
...
}
I want to use an aggregation to get the documents count for every author. Since there may be more than one author for some documents, these documents should be counted for every author individually.
What I've tried:
Since the terms
aggregation worked with the resort
field I tried using it with authors
or the name
field inside, but always getting no buckets at all. For this I used the following curl
request:
curl -X POST 'localhost:9200/news/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": false,
"aggs": {
"author_agg": { "terms": {"field": "authors.keyword" } }
}
}'
I concluded, that the terms
aggregation doesn't work with fields, that are contained by a list.
Next I thought about the nested
aggregation, but the documentation says, it is a
single bucket aggregation
so not what I am searching for. Because I ran out of ideas I tried it, but was getting the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I found this answer and tried use it for my data. I had the following request:
curl -X GET "localhost:9200/news/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"nest": {
"nested": {
"path": "authors"
},
"aggs": {
"authorname": {
"terms" : {
"field": "name.keyword"
}
}
}
}
}
}'
which gave me the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I searched for how to make my path nested using mappings, but I couldn't find out how to accomplish that. I don't even know, if this actually makes sense or not.
So how can I aggregate the documents into buckets based on a key, that lies in elements of a list inside the documents?
Maybe this question have been answered somewhere else, but then I'm not able to state my problem in the right way, since I'm still confused by all the new information. Thank you for your help in advance.
Upvotes: 1
Views: 92
Reputation: 413
I finally solved my problem:
The idea of getting the authors
key mapping nested
was totally right. But unfortunately Elasticsearch does not let you change the type from un-nested
to nested
directly, because all items in this key then have to be indexed too. So you have to go the following way:
_doc
, into it's properties and then into the documents field authors
. There we set type
to nested
.~
curl -X PUT "localhost:9200/new_index?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"_doc" : {
"properties" : {
"authors": { "type": "nested" }
}
}
}
}'
~
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}'
Now we can do the nested
aggregation here, to sort the documents into buckets based on the authors:
curl -X GET 'localhost:9200/new_index/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"authors": {
"nested": {
"path": "authors"
},
"aggs": {
"authors_by_name": {
"terms": { "field": "authors.name.keyword" }
}
}
}
}
}'
I don't know how to rename indices until now, but surely you can just simple delete the old index and then do the described procedure to create another new index with the name of the old one but the custom mapping.
Upvotes: 0