Corbac
Corbac

Reputation: 95

Wrong maxDocs and docFreq with dfs_query_then_fetch

I am trying to understand the calculations done by Elasticsearch to get the idf of a query. The documents I took from an example are created with these lines ( I am using Sense) in Localhost:

POST /library/books/_bulk
{ "index": { "_id":1 }}
{ "title": "The quick brown fox", "price":5 }
{ "index": { "_id":2 }}
{ "title": "The quick brown fox jumps over the lazy dog", "price":15 }
{ "index": { "_id":3 }}
{ "title": "The quick brown fox jumps over the quick dog", "price":8 }
{ "index": { "_id":4 }}
{ "title": "Brown fox brown dog", "price":2 }
{ "index": { "_id":5 }}
{ "title": "Lazy dog", "price":9 }

I don't understand the scoring of the following query:

GET /library/books/_search?explain&search_type=dfs_query_then_fetch
{
  "query":{
    "match": {
      "title": "quick fox"
    }
  }
}

From what I understood of the documentation of ElasticSearch/Lucene, the maxDocs should be 5, and the docFreq 3. However, the explain gives me docFreq=1, maxDocs=1 for the idf of both "quick" and "fox" for the first document ("The quick brown fox"), and maxDocs=2 for another document.

I also tried without the dfs_query_then_fetch and with preference=_primary or _replica with similar results.

The document with the higher score is the correct one, but the idf isn't what I expected.

How can the explain show the correct maxDocs/docFreq, and why don't I have the numbers I expected, even with dfs_query_then_fetch?

Thank you

Upvotes: 2

Views: 395

Answers (1)

keety
keety

Reputation: 17461

This is bug in elasticsearch 2.x explain api when search_type is dfs_query_then_fetch . The explaination and scores do not match the actual score.

The git issue gives more insight on this. The actual score however should be accurate.

Upvotes: 3

Related Questions