Reputation: 2073
I have the problem that some documents are indexed twice or more so I want to filter out this duplicates when searching. I followed some other threads and built this query:
{
"query" : { ... },
"size" : 10,
"from" : 0,
"sort" : { ... },
"aggs" : {
"dedup" : {
"terms" : {
"field" : "content.keyword"
},
"aggs" : {
"dedup_docs" : {
"top_hits" : {
"size" : 1
}
}
}
}
}
}
But it seems that this aggregation has no effect. I'm still getting duplicate results (documents with the same text in the content field).
Request changed:
{
"query" : { ... },
"size" : 10,
"from" : 0,
"sort" : { ... },
"collapse" : {
"field" : "content.keyword"
}
}
Upvotes: 2
Views: 10427
Reputation: 1804
You can also take a look at the recently added field collapsing feature
Upvotes: 4