altralaser
altralaser

Reputation: 2073

Eliminate duplicates in elasticsearch query

I have the problem that some documents are indexed twice or more so I want to filter out this duplicates when searching. I followed some other threads and built this query:

{
  "query" : { ... },
  "size" : 10,
  "from" : 0,
  "sort" : { ... },
  "aggs" : {
    "dedup" : {
      "terms" : {
        "field" : "content.keyword"
      },
      "aggs" : {
        "dedup_docs" : {
          "top_hits" : {
            "size" : 1
          }
        }
      }
    }
  }
}

But it seems that this aggregation has no effect. I'm still getting duplicate results (documents with the same text in the content field).

Request changed:

{
  "query" : { ... },
  "size" : 10,
  "from" : 0,
  "sort" : { ... },
  "collapse" : {
    "field" : "content.keyword"
  }
}

Upvotes: 2

Views: 10427

Answers (1)

alr
alr

Reputation: 1804

You can also take a look at the recently added field collapsing feature

Upvotes: 4

Related Questions