jianfeng
jianfeng

Reputation: 2590

Why the es get idf value is 0.30685282?

I get result from Explain API

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.13424811,
        "hits": [
            {
                "_shard": 2,
                "_node": "Tf1RSzMxQD-AYhmnKQWr8Q",
                "_index": "scoretest",
                "_type": "test",
                "_id": "1",
                "_score": 0.13424811,
                "_source": {
                    "content": "this book is about english",
                    "title": "this is a book"
                },
                "_explanation": {
                    "value": 0.13424811,
                    "description": "weight(content:english in 0) [PerFieldSimilarity], result of:",
                    "details": [
                        {
                            "value": 0.13424811,
                            "description": "fieldWeight in 0, product of:",
                            "details": [
                                {
                                    "value": 1,
                                    "description": "tf(freq=1.0), with freq of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "termFreq=1.0"
                                        }
                                    ]
                                },
                                {
                                    "value": 0.30685282,
                                    "description": "idf(docFreq=1, maxDocs=1)"
                                },
                                {
                                    "value": 0.4375,
                                    "description": "fieldNorm(doc=0)"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

Here I do not understand two point:

1 The formual of idf is :

  public float idf(long docFreq, long numDocs) {
    return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
  }

why docFreq is 1 and numDocs is 1 will got idf value to be 0.30685282?

log(0.5) = -0.3010299957 + 1.0 = 0.6989700043

2 the numDocs is 1?

Does numDocs mean how many docs in my index? I have 2 docs in my index, why it use 1?

about question two, see this query result:

{
    "took": 17,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.13424811,
        "hits": [
            {
                "_shard": 2,
                "_node": "Tf1RSzMxQD-AYhmnKQWr8Q",
                "_index": "scoretest",
                "_type": "test",
                "_id": "1",
                "_score": 0.13424811,
                "_source": {
                    "content": "this book is about english",
                    "title": "this is a book"
                },
                "_explanation": {
                    "value": 0.13424811,
                    "description": "weight(content:book in 0) [PerFieldSimilarity], result of:",
                    "details": [
                        {
                            "value": 0.13424811,
                            "description": "fieldWeight in 0, product of:",
                            "details": [
                                {
                                    "value": 1,
                                    "description": "tf(freq=1.0), with freq of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "termFreq=1.0"
                                        }
                                    ]
                                },
                                {
                                    "value": 0.30685282,
                                    "description": "idf(docFreq=1, maxDocs=1)"
                                },
                                {
                                    "value": 0.4375,
                                    "description": "fieldNorm(doc=0)"
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "_shard": 3,
                "_node": "Tf1RSzMxQD-AYhmnKQWr8Q",
                "_index": "scoretest",
                "_type": "test",
                "_id": "2",
                "_score": 0.13424811,
                "_source": {
                    "content": "this book is about chinese",
                    "title": "this is a book"
                },
                "_explanation": {
                    "value": 0.13424811,
                    "description": "weight(content:book in 0) [PerFieldSimilarity], result of:",
                    "details": [
                        {
                            "value": 0.13424811,
                            "description": "fieldWeight in 0, product of:",
                            "details": [
                                {
                                    "value": 1,
                                    "description": "tf(freq=1.0), with freq of:",
                                    "details": [
                                        {
                                            "value": 1,
                                            "description": "termFreq=1.0"
                                        }
                                    ]
                                },
                                {
                                    "value": 0.30685282,
                                    "description": "idf(docFreq=1, maxDocs=1)"
                                },
                                {
                                    "value": 0.4375,
                                    "description": "fieldNorm(doc=0)"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}

Upvotes: 0

Views: 183

Answers (1)

femtoRgon
femtoRgon

Reputation: 33351

  1. Natural log, not base 10. 1+ln(1/(1+1)) = 0.30685282

  2. Yes, it is the number of documents in the index. However, the documents in your index appear to be in different shards, which are effectively separate indexes, at least as far as the doc count for scoring is concerned. You can read a bit more about this on Jeroen van Wilgenburg's blog: How sharding in elasticsearch makes scoring a little less accurate and what to do about it. I think it bears emphasizing one line in his conclusion though: "With larger sets the score differences will converge."

Upvotes: 1

Related Questions