Sanjay Kanani
Sanjay Kanani

Reputation: 308

Elastic search more like this Query score issue in 5.x

Recently we have changed Elasticsearch version from 2.4 to 5.4 .

we found one issue in more like this query in version 5.x .

following query is used to find out similar documents by text

INPUT Query

POST /test/_search
{
  "size": 10000,
"stored_fields": [
"docid"
],
 "_source": false,
"query": {
"more_like_this": {
"fields": [
    "textcontent"
  ],
  "like": [
    {
      "_index": "test",
      "_type": "object",
      "_id": "AV0c9jvZXF-b5U5aNAWB"
    }
  ],
  "max_query_terms": 5000,
  "min_term_freq": 1,
  "min_doc_freq": 1
}
}
}

Output of Elasticsearch 2.4

{

"took": 16,
"timed_out": false,
"_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
},
"hits": {
    "total": 3,
    "max_score": 1.5381224,
    "hits": [
        {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z9",
            "_score": 1.5381224,
            "fields": {
                "docid": [
                    "2"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal63Z",
            "_score": .5381224,
            "fields": {
                "docid": [
                    "3"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z",
            "_score": .381224,
            "fields": {
                "docid": [
                    "4"
                ]
            }
        }

Output of Elasticsearch 5.4 {

"took": 16,
"timed_out": false,
"_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
},
"hits": {
    "total": 3,
    "max_score": 1.5381224,
    "hits": [
        {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z9",
            "_score": 168.5381224,
            "fields": {
                "docid": [
                    "2"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal63Z",
            "_score": 164.5381224,
            "fields": {
                "docid": [
                    "3"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z",
            "_score": 132.381224,
            "fields": {
                "docid": [
                    "4"
                ]
            }
        }}

The output is same in both versions except the score of the documents. version 5.4 is giving more score than 2.4. We are dependent on score for our work so if the score changes then its a problem for us. Please provide solution for this?

Upvotes: 2

Views: 413

Answers (1)

Sanjay Kanani
Sanjay Kanani

Reputation: 308

I got the solution,In version 5.0 they have changed default similarity algorithm from classic to BM25 that was the reason for it. Just change similarity type to classic while creating index. and if index is already exist then just update setting for all indices by executing following query

PUT /_all/_settings?preserve_existing=true          
{
  "index.similarity.default.type": "classic"
} 

Upvotes: 3

Related Questions