hamid
hamid

Reputation: 714

how to find documents with more matching terms of query in Elasticsearch?

I am looking for a way to custom ElasticSearch scoring to retrieve documents matching more distinct query terms.

My index mapping is:

{
"settings" : {
    "number_of_shards" : 1
},
"mappings" : {
    "properties": {
        "content": {
            "type": "text"
        },
        "display_content": {
            "type": "text"
        }
    }
}

} ''' And my search query to ElasticSearch is:

{
'from': offset,
'size': size,
'query': {
    'function_score': {
        'boost_mode': 'multiply',
        'score_mode': 'sum',
        'functions': [
        ],
        'query': {
            'bool': {
                'must': {
                    'match': {
                        'content': query
                    }
                 },
                 'filter': [
                     {
                         'term': {
                             'searchable': 'true'
                         }
                     }
                 ]
             }
        }
    }
},
'highlight': {
    'fields': {
        'content': {}
    }
},
'track_scores': 'true',
    'sort': [
        {
            '_score': {'order': 'desc'}
        }
    ]
}

For example i have two documents. first document:

{
    "content": "laptop laptop laptop",
    "display_content": ""
}

second document:

{
    "content": "laptop mobile",
    "display_content": ""
}

I want to customize ElasticSearch score to increase score of the second document to a query like mobile laptop. How can i make this?

Upvotes: 0

Views: 75

Answers (1)

Pierre-Nicolas Mougel
Pierre-Nicolas Mougel

Reputation: 2279

You don't need a function_score. This is the default behavior of the match query.

However, I understand that you want to reduce the impact of duplicated terms in the score.

If you want to completely discard duplicated terms you can use a unique token filter. The field "laptop laptop laptop" will then be indexed as "laptop" removing completely the influence of duplicated terms.

If you still want to keep the duplicated terms, you can change their influence using parameter k1 of the BM25 similarity function (the default similarity function).

See the documentation to configure a similarity function for an index. Note that the similarity can be changed without reindexing, you just need to close and reopen the index.

Please note that changing the value of the similarity function parameter is considered as an expert feature. You can read more on this subject in this article

Upvotes: 2

Related Questions