zachdb86
zachdb86

Reputation: 995

How is a nested document relevance score (TF/IDF) calculated in Elasticsearch?

When running a match query on nested fields, are the relevance scores for each nested document calculated based on all nested documents across all root documents, or just the nested documents under a single root document? Basically when TF/IDF is calculated, what is the scope of the collection being used for IDF?

Here is a the nested document:

PUT /channels_index
{
  "mappings": {
    "channel": {
      "properties": {
        "username": { "type": "string" },
        "posts": {
          "type": "nested", 
          "properties": {
            "link":    { "type": "string" },
            "caption": { "type": "string" },
          }
        }
      }
    }
  }
}

And here is the query:

GET channels/_search
{
  "query": {
    "nested": {
      "path": "posts",
      "query": {
        "match": {
          "posts.caption": "adidas"
        }
      },
      "inner_hits": {}
    }
  }
}

However, in my results, even though the second document has a higher max score for inner hits, the first document's root score is somehow higher.

{
  "hits": {
    "total": 2,
    "max_score": 4.3327584,
    "hits": [
      {
        "_index": "channels",
        "_type": "channel",
        "_id": "1",
        "_score": 4.3327584,
        "_source": {
          "username": "user1",
          "posts": [...]
        },
        "inner_hits": {
          "posts": {
            "hits": {
              "total": 2,
              "max_score": 5.5447335,
              "hits": [...]
            }
          }
        }
      },
      {
        "_index": "channels",
        "_type": "channel",
        "_id": "2",
        "_score": 4.2954993,
        "_source": {
          "username": "user2",
          "posts": [...]
        },
        "inner_hits": {
          "posts": {
            "hits": {
              "total": 13,
              "max_score": 11.446381,
              "hits": [...]
            }
          }
        }
      }
    ]
  }
}

Upvotes: 2

Views: 597

Answers (1)

zachdb86
zachdb86

Reputation: 995

After running explain on my query I can see that the TF/IDF score for inner hits is indeed using an IDF calculated from nested documents across all root documents.

As to the root document scoring, the default score mode for nested documents is to average the score. If I want to use the max score of my nested documents I can set it by defining a score_mode. Query below shows how to run explain on a document as well as set a different score mode.

GET channels/channel/1/_explain
{
  "query": {
    "nested": {
      "path": "posts",
      "score_mode": "max", 
      "query": {
        "match": {
          "posts.caption": "adidas"
        }
      },
      "inner_hits": {}
    }
  }
}

Upvotes: 1

Related Questions