Reputation: 10389
I have a big feed of news articles that I'm indexing. I'd like to avoid indexing a lot of articles that are nearly the same (for example, articles from a news service might appear many times with slightly different date formats).
So I thought I'd do a more-like-this query with each article. If I get back a hit with a score > some cutoff, then I figure the article is already indexed, and I don't bother with it.
But when I run my more-like-this query, all the hits I get come back with a score of zero. I can't tell if that's expected, if I'm doing something wrong, or if I've discovered a bug.
My query looks like:
POST _search
{"query":
{"bool":
{"filter": [
{"more_like_this":
{"fields": ["text"],
"like": "Doctor Sentenced In $3.1M Health Care Fraud Scheme Justice Department Documents & Publications \nGreenbelt, Maryland - U.S. District Judge Deborah K. Chasanow sentenced physician [snip]"
}
}
]
}
}
And the results I get back are:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 390,
"max_score": 0,
"hits": [
[snip]
Upvotes: 0
Views: 1801
Reputation: 31
You get zero score because the Filter part of the Bool operator is not included in the calculation of the score. It is used only to filter results. You should use the MUST operator to get a score.
POST _search
{"query":
{"bool":
{"must": [
{"more_like_this":
{"fields": ["text"],
"like": "Doctor Sentenced In $3.1M Health Care Fraud Scheme Justice Department Documents & Publications \nGreenbelt, Maryland - U.S. District Judge Deborah K. Chasanow sentenced physician [snip]"
}
}
]
}
}
For more information, see the doc https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
Upvotes: 0
Reputation: 11
The reason is because you have your MLT query inside a filter query. Filter queries always return a score of zero. Put your MLT within a Must or Should query and you will get back scores.
Upvotes: 1
Reputation: 409
I was facing similar issue today, more_like_this query was not returning result to me. as i was using non-default routing and not passing _routing
.
My query looks like below, i had to search in article
in default_11
index in document fields keywords
and contents
.
GET localhost:9200/alias_default/articles/_search
{
"more_like_this": {
"fields": [
"keywords",
"contents"
],
"like": {
"_index": "default_11",
"_type": "articles",
"_routing": "6",
"_id": "1000000000006000000000000000014"
},
"min_word_length": 2,
"min_term_freq": 2
}
}
Also keep in mind passing _routing
parameter.
This issue typically occurs when documents are indexed with non-default routing
See: ElasticSearch returns document in search but not in GET
Upvotes: 0