AndroidStorm
AndroidStorm

Reputation: 161

ElasticSearch: Get distinct field values from multi_match

My Query with multiple multi_matches looks like follows:

"query": {
   "bool": {
     "should" : [
       {"multi_match" : {
         "query": "test",
         "fields":     ["field1^15", "field2^8"],
         "tie_breaker": 0.2,
         "minimum_should_match": "50%"
       }},
       {"multi_match" : {
          "query": "test2",
          "fields":     ["field1^15", "field2^8"],
          "tie_breaker": 0.2,
          "minimum_should_match": "50%"
         }
        }
      ]
     }
    }

I want to get all distinct field1 values which match the query. How can I realize that?

EDIT: Mapping:

"field1": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "analyzer": "nGram_analyzer"
        }

This is what I tried so far (I still get multiple identical field1 values):

"query": {
   "bool": {
     "should" : [
       {"multi_match" : {
         "query": "test",
         "fields":     ["field1^15", "field2^8"],
         "tie_breaker": 0.2,
         "minimum_should_match": "50%"
       }},
       {"multi_match" : {
          "query": "test2",
          "fields":     ["field1^15", "field2^8"],
          "tie_breaker": 0.2,
          "minimum_should_match": "50%"
         }
        }
      ]
     }
    },
"aggs": {
    "field1": {
      "terms": {
        "field": "field1.keyword",
        "size": 100 //1
      }
    }
  }

UPDATE:

The query

    GET /test/test/_search
{
  "_source": ["field1"],
  "size": 10000,
  "query": {
                    "multi_match" : {
                      "query":      "test",
                      "fields":     ["field1^15", "field2^8"],
                      "tie_breaker": 0.2,
                      "minimum_should_match": "50%"
                    }
                },
  "aggs": {
    "field1": {
      "terms": {
        "field": "field1.keyword",
        "size": 1
      }
    }
  }
}

results in

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 10,
    "successful": 10,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 35,
    "max_score": 110.26815,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "AVzz99c4X4ZbfhscNES7",
        "_score": 110.26815,
        "_source": {
          "field1": "test-hier"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "AVzz8JWGX4ZbfhscMwe_",
        "_score": 107.45808,
        "_source": {
          "field1": "test-hier"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "AVzz8JWGX4ZbfhscMwe_",
        "_score": 107.45808,
        "_source": {
          "field1": "test-da"
        }
      },
      ...

So actually there should only be one "test-hier".

Upvotes: 0

Views: 632

Answers (1)

Val
Val

Reputation: 217514

You can add a terms aggregation on the field1.keyword field and you'll get all distinct values (you can change size to any other value that better matches the cardinality of your field):

{
  "size": 0,
  "query": {...},
  "aggs": {
    "field1": {
      "terms": {
        "field": "field1.keyword",
        "size": 100
      },
      "aggs": {
        "single_hit": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}

Upvotes: 1

Related Questions