Chris Curvey
Chris Curvey

Reputation: 10389

why do all my ElasticSearch more-like-this hits have a score of zero?

I have a big feed of news articles that I'm indexing. I'd like to avoid indexing a lot of articles that are nearly the same (for example, articles from a news service might appear many times with slightly different date formats).

So I thought I'd do a more-like-this query with each article. If I get back a hit with a score > some cutoff, then I figure the article is already indexed, and I don't bother with it.

But when I run my more-like-this query, all the hits I get come back with a score of zero. I can't tell if that's expected, if I'm doing something wrong, or if I've discovered a bug.

My query looks like:

POST _search
{"query": 
  {"bool": 
    {"filter": [
      {"more_like_this": 
        {"fields": ["text"], 
         "like": "Doctor Sentenced In $3.1M Health Care Fraud Scheme  Justice Department Documents & Publications \nGreenbelt, Maryland - U.S. District Judge Deborah K. Chasanow sentenced physician [snip]"
        }
      }
    ]
  }
}

And the results I get back are:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 390,
    "max_score": 0,
    "hits": [
      [snip]

Upvotes: 0

Views: 1801

Answers (3)

Youcef MERZOUG
Youcef MERZOUG

Reputation: 31

You get zero score because the Filter part of the Bool operator is not included in the calculation of the score. It is used only to filter results. You should use the MUST operator to get a score.

POST _search
{"query": 
  {"bool": 
    {"must": [
      {"more_like_this": 
        {"fields": ["text"], 
         "like": "Doctor Sentenced In $3.1M Health Care Fraud Scheme  Justice Department Documents & Publications \nGreenbelt, Maryland - U.S. District Judge Deborah K. Chasanow sentenced physician [snip]"
        }
      }
    ]
  }
}

For more information, see the doc https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html

Upvotes: 0

Gabe Roffman
Gabe Roffman

Reputation: 11

The reason is because you have your MLT query inside a filter query. Filter queries always return a score of zero. Put your MLT within a Must or Should query and you will get back scores.

Upvotes: 1

Bishwanath Jha
Bishwanath Jha

Reputation: 409

I was facing similar issue today, more_like_this query was not returning result to me. as i was using non-default routing and not passing _routing.

My query looks like below, i had to search in article in default_11 index in document fields keywords and contents.

GET localhost:9200/alias_default/articles/_search
{
                "more_like_this": {
                    "fields": [
                        "keywords",
                        "contents"
                    ],
                    "like": {
                        "_index": "default_11",
                        "_type": "articles",
                        "_routing": "6",
                        "_id": "1000000000006000000000000000014"
                    },
                    "min_word_length": 2,
                    "min_term_freq": 2
                }
  }

Also keep in mind passing _routing parameter.

This issue typically occurs when documents are indexed with non-default routing

See: ElasticSearch returns document in search but not in GET

Upvotes: 0

Related Questions