Reputation: 1768
I would like to be able to query for text but also retrieve only the results with the maximum value of a certain integer field in my data. I have read the docs about aggregations and filters and I don't quite see what I am looking for.
For instance, I have some repeating data that gets indexed that is the same except for an integer field - let's call this field lastseen
.
So, as an example, given this data put into elasticsearch:
// these two the same except "lastseen" field
curl -XPOST localhost:9200/myindex/myobject -d '{
"field1": "dinner carrot potato broccoli",
"field2": "something here",
"lastseen": 1000
}'
curl -XPOST localhost:9200/myindex/myobject -d '{
"field1": "dinner carrot potato broccoli",
"field2": "something here",
"somevalue": 100
}'
# and these two the same except "lastseen" field
curl -XPOST localhost:9200/myindex/myobject -d '{
"field1": "fish chicken something",
"field2": "dinner",
"lastseen": 2000
}'
curl -XPOST localhost:9200/myindex/myobject -d '{
"field1": "fish chicken something",
"field2": "dinner",
"lastseen": 200
}'
If I query for "dinner"
curl -XPOST localhost:9200/myindex -d '{
"query": {
"query_string": {
"query": "dinner"
}
}
}'
I'll get 4 results back. I'd like to have a filter such that I only get two results back - only the items with the maximum lastseen
field.
This is obviously not right, but hopefully it gives you an idea of what I am after:
{
"query": {
"query_string": {
"query": "dinner"
}
},
"filter": {
"max": "lastseen"
}
}
The results would look something like:
"hits": [
{
...
"_source": {
"field1": "dinner carrot potato broccoli",
"field2": "something here",
"lastseen": 1000
}
},
{
...
"_source": {
"field1": "fish chicken something",
"field2": "dinner",
"lastseen": 2000
}
}
]
update 1: I tried creating a mapping that excluded lastseen
from being indexed. This did not work. Still getting all 4 results back.
curl -XPOST localhost:9200/myindex -d '{
"mappings": {
"myobject": {
"properties": {
"lastseen": {
"type": "long",
"store": "yes",
"include_in_all": false
}
}
}
}
}'
update 2: I tried a deduplication with the agg scheme listed here, and it did not work, but more importantly, I don't see a way to combine that with a keyword search.
Upvotes: 2
Views: 2963
Reputation: 52366
Not ideal, but I think it gets you what you need.
Change the mapping of your field1
field, assuming this is the one that you use to define "duplicate" documents, like this:
PUT /lastseen
{
"mappings": {
"test": {
"properties": {
"field1": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"field2": {
"type": "string"
},
"lastseen": {
"type": "long"
}
}
}
}
}
meaning, you add a .raw
subfield that is not_analyzed
which means it will be indexed just the way it is, no analysis and split into terms. This is to make possible the somewhat "duplicate documents spotting".
Then, you need to use a terms
aggregation on field1.raw
(for duplicates) and a top_hits
sub-aggregation to get a single document for each field1
value:
GET /lastseen/test/_search
{
"size": 0,
"query": {
"query_string": {
"query": "dinner"
}
},
"aggs": {
"field1_unique": {
"terms": {
"field": "field1.raw",
"size": 2
},
"aggs": {
"first_one": {
"top_hits": {
"size": 1,
"sort": [{"lastseen": {"order":"desc"}}]
}
}
}
}
}
}
Also, that single document returned by top_hits
is the one with the highest lastseen
(thing made possible by "sort": [{"lastseen": {"order":"desc"}}]
).
The results you will get back are these (under aggregations
not hits
):
...
"aggregations": {
"field1_unique": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "dinner carrot potato broccoli",
"doc_count": 2,
"first_one": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "lastseen",
"_type": "test",
"_id": "AU60ZObtjKWeJgeyudI-",
"_score": null,
"_source": {
"field1": "dinner carrot potato broccoli",
"field2": "something here",
"lastseen": 1000
},
"sort": [
1000
]
}
]
}
}
},
{
"key": "fish chicken something",
"doc_count": 2,
"first_one": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "lastseen",
"_type": "test",
"_id": "AU60ZObtjKWeJgeyudJA",
"_score": null,
"_source": {
"field1": "fish chicken something",
"field2": "dinner",
"lastseen": 2000
},
"sort": [
2000
]
}
]
}
}
}
]
}
}
Upvotes: 4