How to Get Distinct Token Count

Question

I want to calculate value of doc['_num_matches'] to use in script_score, which is now a hypothetical field as count of should clause match, for every should clause match, adds one to the doc['_num_matches'].

e.g.

doc0: "banana"
doc0['_num_matches'] == 1

doc1: "apple apple apple apple apple apple banana"
doc1['_num_matches'] == 2

doc2: "apple banana cherry"
doc2['_num_matches'] == 3

{
    "query": {
        "function_score": {
            "query": {
                "bool": {
                    "should": [
                        {
                            "match": {
                                "content": {
                                    "query": "apple",
                                    "analyzer": "kuromoji"
                                }
                            }
                        },
                        {
                            "match": {
                                "content": {
                                    "query": "cherry",
                                    "analyzer": "kuromoji"
                                }
                            }
                        },
                        {
                            "match": {
                                "content": {
                                    "query": "banana",
                                    "analyzer": "kuromoji"
                                }
                            }
                        }
                    ],
                    "minimum_should_match": "2<80%"
                }
            },
            "functions": [
                {
                    "script_score": {
                        "script": {
                            "source": "_score * Math.log(1 + doc['_num_matches'].value)"
                        }
                    }
                }
            ],
            "boost_mode": "replace"
        }
    },
    "size": 200
}

How to Get Distinct Token Count

Answers (1)

Related Questions