Wrong maxDocs and docFreq with dfs_query_then_fetch

Question

I am trying to understand the calculations done by Elasticsearch to get the idf of a query. The documents I took from an example are created with these lines ( I am using Sense) in Localhost:

POST /library/books/_bulk
{ "index": { "_id":1 }}
{ "title": "The quick brown fox", "price":5 }
{ "index": { "_id":2 }}
{ "title": "The quick brown fox jumps over the lazy dog", "price":15 }
{ "index": { "_id":3 }}
{ "title": "The quick brown fox jumps over the quick dog", "price":8 }
{ "index": { "_id":4 }}
{ "title": "Brown fox brown dog", "price":2 }
{ "index": { "_id":5 }}
{ "title": "Lazy dog", "price":9 }

I don't understand the scoring of the following query:

GET /library/books/_search?explain&search_type=dfs_query_then_fetch
{
  "query":{
    "match": {
      "title": "quick fox"
    }
  }
}

From what I understood of the documentation of ElasticSearch/Lucene, the maxDocs should be 5, and the docFreq 3. However, the explain gives me docFreq=1, maxDocs=1 for the idf of both "quick" and "fox" for the first document ("The quick brown fox"), and maxDocs=2 for another document.

I also tried without the dfs_query_then_fetch and with preference=_primary or _replica with similar results.

The document with the higher score is the correct one, but the idf isn't what I expected.

How can the explain show the correct maxDocs/docFreq, and why don't I have the numbers I expected, even with dfs_query_then_fetch?

Thank you

keety · Accepted Answer

This is bug in elasticsearch 2.x explain api when search_type is dfs_query_then_fetch . The explaination and scores do not match the actual score.

The git issue gives more insight on this. The actual score however should be accurate.

Wrong maxDocs and docFreq with dfs_query_then_fetch

Answers (1)

Related Questions