Reputation: 285
We have created an index with the document
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Osaka"
}
there is only one document in the index, when we are performing _explain api using match query on the index
GET sample-index-test/_explain/1
{
"query": {
"match": {
"first_name": "James"
}
}
}
Explain api returns below details
{
"_index" : "sample-index-test",
"_type" : "_doc",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.2876821,
"description" : "weight(first_name:james in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.2876821,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.2876821,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 1,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
}
Now, running the same index request multiple times in the span of seconds
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Cena"
}
Again running the same _explain api returns a different score with number of documents containing term and total number of documents with field.
{
"_index" : "sample-index-test",
"_type" : "_doc",
"_id" : "1",
"matched" : true,
"explanation" : {
"value" : 0.046520013,
"description" : "weight(first_name:james in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.046520013,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.046520017,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 10,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 10,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
}
Why elasticsearch increasing count of total number of documents with field and number of documents containing term, same time index only contains a single document?
Upvotes: 0
Views: 211
Reputation: 1942
Elasticsearch using Lucene and all the documents stored in segments. And the segments are immutable, and document update is a 2-step process. When a document is updated, then a new document is created, and the old document is marked as deleted. So, when you create the first document in the segments, there are just only one documents. Then you update the same document 10 times, the number of deleted documents will be 9, and the latest document will be 1. For this reason, "the number of documents with field" and "number of documents containing term" is changing.
You can test with with using _forcemerge
endpoint. Force Merge will merge the segments and clear the deleted documents from the segments.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html
## 1. Create the document
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Osaka"
}
## 2. Get the explain score
GET sample-index-test/_explain/1
{
"query": {
"match": {
"first_name": "James"
}
}
}
## "value": 0.2876821,
## n, number of documents containing term => 1
## N, total number of documents with field => 1
## 3.1. Execute this 10 times
POST sample-index-test/_doc/1
{
"first_name": "James",
"last_name" : "Cena"
}
## 3.2 You can execute this one also
POST sample-index-test/_update/1
{
"script" : "ctx._source.first_name = 'James'; ctx._source.last_name = 'Cena';"
}
## 3.3 Even you can use _update_by_query
POST sample-index-test/_update_by_query
{
"query": {
"match": {
"first_name": "James"
}
},
"script": {
"source": "ctx._source.first_name = 'James'; ctx._source.last_name = 'Cena';",
"lang": "painless"
}
}
## 4. Get the explain score
GET sample-index-test/_explain/1
{
"query": {
"match": {
"first_name": "James"
}
}
}
## "value": 0.046520013,
## n, number of documents containing term => 10
## N, total number of documents with field => 10
## 5. Execute the force merge.
POST sample-index-test/_forcemerge
## 6. The ForceMerge will start in the background. So, you need to wait a couple of seconds.
GET sample-index-test/_explain/1
{
"query": {
"match": {
"first_name": "James"
}
}
}
## "value": 0.2876821,
## n, number of documents containing term => 1
## N, total number of documents with field => 1
Upvotes: 0