TheHippo
TheHippo

Reputation: 63139

Filter elasticsearch results to contain only unique documents based on one field value

All my documents have a uid field with an ID that links the document to a user. There are multiple documents with the same uid.

I want to perform a search over all the documents returning only the highest scoring document per unique uid.

The query selecting the relevant documents is a simple multi_match query.

Upvotes: 19

Views: 19885

Answers (2)

Chase
Chase

Reputation: 3183

In ElasticSearch 5.3 they added support for field collapsing. You should be able to do something like:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":    "this is a test", 
      "fields": [ "subject", "message", "uid" ] 
    }
  },
  "collapse" : {
    "field" : "uid" 
  },
  "size": 20,
  "from": 100
}

The benefit of using field collapsing instead of a top hits aggregation is that you can use pagination with field collapsing.

Upvotes: 18

Andrei Stefan
Andrei Stefan

Reputation: 52368

You need a top_hits aggregation.

And for your specific case:

{
  "query": {
    "multi_match": {
      ...
    }
  },
  "aggs": {
    "top-uids": {
      "terms": {
        "field": "uid"
      },
      "aggs": {
        "top_uids_hits": {
          "top_hits": {
            "sort": [
              {
                "_score": {
                  "order": "desc"
                }
              }
            ],
            "size": 1
          }
        }
      }
    }
  }
}

The query above does perform your multi_match query and aggregates the results based on uid. For each uid bucket it returns only one result, but after all the documents in the bucket were sorted based on _score in descendant order.

Upvotes: 23

Related Questions