Why the es get idf value is 0.30685282?

Question

I get result from Explain API

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.13424811,
        "hits": [
            {
                "_shard": 2,
                "_node": "Tf1RSzMxQD-AYhmnKQWr8Q",
                "_index": "scoretest",
                "_type": "test",
                "_id": "1",
                "_score": 0.13424811,
                "_source": {
                    "content": "this book is about english",
                    "title": "this is a book"
                },
                "_explanation": {
                    "value": 0.13424811,
                    "description": "weight(content:english in 0) [PerFieldSimilarity], result of:",
                    "details": [
                        {
                            "value": 0.13424811,
                            "description": "fieldWeight in 0, product of:",
                            "details": [
                                {
                                    "value": 1,
                                    "description": "tf(freq=1.0), with freq of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "termFreq=1.0"
                                        }
                                    ]
                                },
                                {
                                    "value": 0.30685282,
                                    "description": "idf(docFreq=1, maxDocs=1)"
                                },
                                {
                                    "value": 0.4375,
                                    "description": "fieldNorm(doc=0)"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

Here I do not understand two point:

1 The formual of idf is :

  public float idf(long docFreq, long numDocs) {
    return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
  }

why docFreq is 1 and numDocs is 1 will got idf value to be 0.30685282?

log(0.5) = -0.3010299957 + 1.0 = 0.6989700043

2 the numDocs is 1？

Does numDocs mean how many docs in my index? I have 2 docs in my index, why it use 1?

about question two, see this query result:

{
    "took": 17,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.13424811,
        "hits": [
            {
                "_shard": 2,
                "_node": "Tf1RSzMxQD-AYhmnKQWr8Q",
                "_index": "scoretest",
                "_type": "test",
                "_id": "1",
                "_score": 0.13424811,
                "_source": {
                    "content": "this book is about english",
                    "title": "this is a book"
                },
                "_explanation": {
                    "value": 0.13424811,
                    "description": "weight(content:book in 0) [PerFieldSimilarity], result of:",
                    "details": [
                        {
                            "value": 0.13424811,
                            "description": "fieldWeight in 0, product of:",
                            "details": [
                                {
                                    "value": 1,
                                    "description": "tf(freq=1.0), with freq of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "termFreq=1.0"
                                        }
                                    ]
                                },
                                {
                                    "value": 0.30685282,
                                    "description": "idf(docFreq=1, maxDocs=1)"
                                },
                                {
                                    "value": 0.4375,
                                    "description": "fieldNorm(doc=0)"
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "_shard": 3,
                "_node": "Tf1RSzMxQD-AYhmnKQWr8Q",
                "_index": "scoretest",
                "_type": "test",
                "_id": "2",
                "_score": 0.13424811,
                "_source": {
                    "content": "this book is about chinese",
                    "title": "this is a book"
                },
                "_explanation": {
                    "value": 0.13424811,
                    "description": "weight(content:book in 0) [PerFieldSimilarity], result of:",
                    "details": [
                        {
                            "value": 0.13424811,
                            "description": "fieldWeight in 0, product of:",
                            "details": [
                                {
                                    "value": 1,
                                    "description": "tf(freq=1.0), with freq of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "termFreq=1.0"
                                        }
                                    ]
                                },
                                {
                                    "value": 0.30685282,
                                    "description": "idf(docFreq=1, maxDocs=1)"
                                },
                                {
                                    "value": 0.4375,
                                    "description": "fieldNorm(doc=0)"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

femtoRgon · Accepted Answer

Natural log, not base 10. 1+ln(1/(1+1)) = 0.30685282
Yes, it is the number of documents in the index. However, the documents in your index appear to be in different shards, which are effectively separate indexes, at least as far as the doc count for scoring is concerned. You can read a bit more about this on Jeroen van Wilgenburg's blog: How sharding in elasticsearch makes scoring a little less accurate and what to do about it. I think it bears emphasizing one line in his conclusion though: "With larger sets the score differences will converge."

Why the es get idf value is 0.30685282?

Answers (1)

Related Questions