cskuntal10
cskuntal10

Reputation: 66

elasticsearch more_like_this query not working for arrays greater than size 6

I have documents indexed in elasticsearch with an array of keywords field. Following is the mapping

{
    "alerts": {
        "aliases": {},
        "mappings": {
            "properties": {
                "recordTags": {
                    "type": "keyword"
             }
        }
    }
}

I insert recordTags as arrays. One document has 7 unique recordTags. There is one more document with one recordTags from the first document.

First Document looks like below

{
    "_index": "alerts",
    "_type": "_doc",
    "_id": "9bcb78db-77bc-4ed9-9972-d305f145a06a",
    "_version": 30,
    "_seq_no": 481,
    "_primary_term": 5,
    "found": true,
    "_source": {
         "recordTags": [
            "tag1",
            "tag2",
            "tag3",
            "tag4",
            "tag5",
            "tag6",
            "tag7"
        ],
    }
}

The other document looks like below

{
    "_index": "alerts",
    "_type": "_doc",
    "_id": "582d9497-c43b-4081-a6c7-189ede176702",
    "_version": 30,
    "_seq_no": 481,
    "_primary_term": 5,
    "found": true,
    "_source": {
         "recordTags": [
            "tag1"
        ],
    }
}

Now when I query for similar records to first document based on recordTags field, it does not bring any results. I use the following query

{
    "query": {
      "bool": {
        "should": [
          {
            "more_like_this": {
              "fields": [
                "recordTags"
              ],
              "like": [
                {
                  "_index": "alerts",
                  "_id": "9bcb78db-77bc-4ed9-9972-d305f145a06a"
                }
              ],
              "min_term_freq": 1,
              "min_doc_freq": 1,
              "max_query_terms": 12
            }
          }
        ]
      }
    }
}

Can someone enlighten me on this. I am not able to figure out the issue.

Upvotes: 0

Views: 262

Answers (1)

cskuntal10
cskuntal10

Reputation: 66

The reason was the parameter minimum_should_match. The default value for this parameter is 30%. That means at least 30% of the terms in the original document should match in the target document. If 30% of the terms count comes out to be float value it takes floor of the value.

Since there are 7 terms in original document it needs at least 30% i.e. 2.1 i.e. 2 terms to match in a document to qualify for the result. Changing the value of parameter minimum_should_match worked.

Upvotes: 1

Related Questions