Reputation: 591
I have created a Elasticsearch cluster with 3 nodes , having 3 shards and 2 replicas. The same query fetch different results when hit to the same index with same data. Right now the results are basically sorted by the _score field desc (I think its the default way of sorting) and requirement also wants that the result be sorted in desc order of there score. So here my question is why does same query yield different result, and then how can this be corrected to have same result every time with same query.
query attached
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": {
"bool": {
"must": {
"terms": {
"context": [
"my name"
]
}
},
"should": {
"multi_match": {
"query": "test",
"fields": [
"field1^2",
"field2^2",
"field3^3"
]
}
},
"minimum_should_match": "1"
}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"audiencecomb": [
"1235"
]
}
},
{
"terms": {
"consumablestatus": [
"1"
]
}
}
],
"minimum_should_match": "1"
}
}
}
}
}
Upvotes: 4
Views: 4416
Reputation: 9320
One of the possible reasons could be distributed IDF, by default Elastic uses local IDF on each shard, to save some performance which will lead to different idfs across the cluster. So, you should try ?search_type=dfs_query_then_fetch
, which will explicitly asks Elastic to compute global IDF.
However, for performance reasons, Elasticsearch doesn’t calculate the IDF across all documents in the index. Instead, each shard calculates a local IDF for the documents contained in that shard.
Because our documents are well distributed, the IDF for both shards will be the same. Now imagine instead that five of the foo documents are on shard 1, and the sixth document is on shard 2. In this scenario, the term foo is very common on one shard (and so of little importance), but rare on the other shard (and so much more important). These differences in IDF can produce incorrect results.
In practice, this is not a problem. The differences between local and global IDF diminish the more documents that you add to the index. With real-world volumes of data, the local IDFs soon even out. The problem is not that relevance is broken but that there is too little data.
For testing purposes, there are two ways we can work around this issue. The first is to create an index with one primary shard, as we did in the section introducing the match query. If you have only one shard, then the local IDF is the global IDF.
The second workaround is to add ?search_type=dfs_query_then_fetch to your search requests. The dfs stands for Distributed Frequency Search, and it tells Elasticsearch to first retrieve the local IDF from each shard in order to calculate the global IDF across the whole index.
For more information take a look here
Upvotes: 4