Reputation: 670
I created an index in Elasticsearch with the following settings. After inserting data into the index using Bulk API, the docs.deleted
count is continuously increasing. Does this mean the documents are automatically getting deleted, if so what did i do wrong ?
PUT /inc_index/
{
"mappings": {
"store": {
"properties": {
"title": {
"type": "string",
"term_vector": "with_positions_offsets_payloads",
"store" : true,
"index_analyzer" : "fulltext_analyzer"
},
"description": {
"type": "string",
"term_vector": "with_positions_offsets_payloads",
"store" : true,
"index_analyzer" : "fulltext_analyzer"
},
"category": {
"type": "string"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : 5,
"number_of_replicas" : 1
},
"analysis": {
"analyzer": {
"fulltext_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"type_as_payload"
]
}
}
}
}
}
The output of "GET /_cat/indices?v"
is as shown below, the "docs.deleted"
is continuously increasing:
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open inc_index 5 1 2009877 584438 6.8gb 3.6gb
Upvotes: 8
Views: 7123
Reputation: 1777
This can happen if your machine is too slow
If it's too slow handling the (bulk)insertion, for example when your documents are pretty big or if there are just too many of them at once.
After slowing down the indexing process there was no document loss anymore - still strange why the documents not being inserted where listed under "deleted" which seems to me as they where indeed processed.
This occured to me using Elasticdump and could be resolved by setting the --limit
option to a lower number.
Upvotes: 2
Reputation: 23
ElasticSearch indexes have been composed of “segments”. Since segments have a policy of "write once", when we delete/update any document from ElasticSearch, it is not actually deleted, only marked as deleted and increases the count in "doc.deleted".
The more segments means slower searches and more memory used. Elasticsearch solves this problem by merging segments in the background. Small segments are merged into bigger segments, which, in turn, are merged into even bigger segments...while merging those segments if there are any documents which are marked as deleted, it doesn't copy that doc in the bigger segment. And Once merging has finished, the old segments are deleted. That's why there is further decrease in "doc.deleted" value.
Upvotes: 2
Reputation: 52368
If your bulk operations also include updates to existing documents (insert/update to documents with the same ID), then this is normal. In Elasticsearch, an update is a combo of delete+insert operations: https://www.elastic.co/guide/en/elasticsearch/guide/current/update-doc.html
And the deleted documents you see there are documents marked as deleted. When the Lucene segments merging happens, the deleted documents will be physically removed from disk.
Upvotes: 11