Reputation: 1656
By default, a Terms aggregation gives me the top-10 most-popular terms, and their counts, and then a sum_other_doc_count field representing the "Other" items.
I can display these to the user:
first (150)
second (122)
third(111)
...
other(19)
...The user can then filter their results by choosing one of the terms. I apply a TermFilter using the term they select. Works fine.
...However.......Is there a way I can create a filter which represents "other" (ie all terms except the top-10) ?
Upvotes: 2
Views: 7444
Reputation: 8718
I don't think so. You can hack together something related (but not quite the same) with terms and not filters, though, which returns all documents in which the top terms do NOT appear. I'm going to use top 5 for simplicity.
So I created an index and added some random Latin text:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc"}}
{"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec rhoncus dictum ligula, quis volutpat diam fringilla ut."}
{"index":{"_index":"test_index","_type":"doc"}}
{"text": "Nulla ac gravida ipsum. Pellentesque placerat mattis pharetra. Praesent sapien lorem, auctor in imperdiet vel, lacinia vel diam."}
{"index":{"_index":"test_index","_type":"doc"}}
{"text": "Mauris a risus ut eros posuere rutrum. Nunc scelerisque diam ex, consequat mollis sem facilisis in."}
{"index":{"_index":"test_index","_type":"doc"}}
{"text": "Maecenas lacinia sollicitudin ultricies. Aenean id eleifend sapien. In et justo accumsan, cursus mi vel, consectetur augue. Nullam in quam ac magna iaculis finibus quis ut risus."}
{"index":{"_index":"test_index","_type":"doc"}}
{"text": "Donec dolor eros, rhoncus ultricies quam et, dapibus egestas libero."}
then got the top 5 terms:
POST /test_index/_search?search_type=count
{
"aggs": {
"top_terms":{
"terms":{
"field": "text",
"size": 5
}
}
}
}
...
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"top_terms": {
"buckets": [
{
"key": "diam",
"doc_count": 3
},
{
"key": "in",
"doc_count": 3
},
{
"key": "ut",
"doc_count": 3
},
{
"key": "ac",
"doc_count": 2
},
{
"key": "consectetur",
"doc_count": 2
}
]
}
}
}
then I can construct a filter that gives me back the docs in which the top 5 terms do NOT appear, like:
POST /test_index/_search
{
"query": {
"constant_score": {
"filter": {
"not": {
"filter": {
"terms": {
"text": [
"diam",
"in",
"ut",
"ac",
"consectetur"
]
}
}
}
},
"boost": 1.2
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "4uoLr70rRXulHHc7N3Ujmw",
"_score": 1,
"_source": {
"text": "Donec dolor eros, rhoncus ultricies quam et, dapibus egestas libero."
}
}
]
}
}
I know this doesn't really answer your question, but maybe it will give you some ideas.
Here is the code I used (if you're using ES 1.4 you'll have to turn on CORS to be able to use the code in the browser):
http://sense.qbox.io/gist/93b69375c5491f1b0458e2053a08e65006f34a1c
Upvotes: 3