Reputation: 3122
Need to find unique string values that are in list field.
The question is similar to ElasticSearch - Return Unique Values but now field values are lists
Records:
PUT items/1
{ "tags" : ["a", "b"] }
PUT items/2
{ "tags" : ["b", "c"] }
PUT items/3
{ "tags" : ["a" "d"] }
Query:
GET items/_search
{ ... }
# => Expected Response
["a", "b", "c", "d"]
Is there way to make such search?
Upvotes: 0
Views: 415
Reputation: 945
Good news! We can use the exact same aggregation as the one used in the SO post you linked to in the description. In fact, if we were submitting a list of numeric values, our work would be done already! However the main difference between this question and the question you referenced is that you are using a "string" type.
It is useful to know that in more recent versions of elasticsearch, there are two ways to represent "strings" in elasticsearch and that type is actually not referred to as a string any more. Using the keyword type will treat the entire text as a single token, while using the text type will apply an analyzer to break the text up into many different tokens and build an index with those tokens.
For example, the string "Foxes are brown" can be represented as "foxes are brown"
or ["foxes", "are", "brown"]
in the index. In your case, tags should be treated as a keyword so we'll need to tell elasticsearch that that field is a keyword
and not text
which is the default.
NOTE: Using the keyword type whenever possible will alleviate the issue of needing to allow elasticsearch to set fielddata to true, which uses up a lot of memory in your cluster if this aggregation is used much. Tags and ordinal data are good candidates for the keyword type.
Anyways, let's get to the real stuff eh?
First, you're going to want to set the mapping for tags in the items as a keyword type.
curl --request PUT \
--url http://localhost:9200/items \
--header 'content-type: application/json' \
--data '{
"mappings": {
"item": {
"properties": {
"tags" : { "type": "keyword" }
}
}
}
}
'
Then you're going to run the aggregation similar to the aggregation in the post you referenced.
curl --request POST \
--url http://localhost:9200/items/item/_search \
--header 'content-type: application/json' \
--data '{
"size": 0,
"aggregations": {
"tags_term_agg": {
"terms": {
"field": "tags"
}
}
}
}'
Your response should looks something like this.
{
"took": 24,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"tags_term_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2
},
{
"key": "b",
"doc_count": 2
},
{
"key": "c",
"doc_count": 1
},
{
"key": "d",
"doc_count": 1
}
]
}
}
}
Upvotes: 1