Aviral Srivastava
Aviral Srivastava

Reputation: 4582

Does updating a doc increase the "delete" count of the index?

I am facing a strange issue in the number of docs getting deleted in an elasticsearch index. The data is never deleted, only inserted and/or updated. While I can see that the total number of docs are increasing, I have also been seeing some non-zero values in the docs deleted column. I am unable to understand from where did this number come from.

I tried reading whether the update doc first deletes the doc and then re-indexes it so in this way the delete count gets increased. However, I could not get any information on this.

The command I type to check the index is:

curl -XGET localhost:9200/_cat/indices

The output I get is:

yellow open e0399e012222b9fe70ec7949d1cc354f17369f20               zcq1wToKRpOICKE9-cDnvg 5 1 21219975 4302430  64.3gb  64.3gb

Note: It is a single node elasticsearch.

I expect to know the reason behind deletion of docs.

Upvotes: 4

Views: 1708

Answers (1)

Nishant
Nishant

Reputation: 7864

You are correct that updates are the cause that you see a count for documents delete.

If we talk about lucene then there is nothing like update there. It can also be said that documents in lucene are immutable.

So how does elastic provides the feature of update?

It does so by making use of _source field. Therefore it is said that _source should be enabled to make use of elastic update feature. When using update api, elastic refers to the _source to get all the fields and their existing values and replace the value for only the fields sent in update request. It marks the existing document as deleted and index a new document with the updated _source.

What is the advantage of this if its not an actual update?

  1. It removes the overhead from application to always compile the complete document even when a small subset of fields need to update. Rather than sending the full document, only the fields that need an update can be sent using update api. Rest is taken care by elastic.

  2. It reduces some extra network round-trips, reduce payload size and also reduces the chances of version conflict.

You can read more how update works here.

Upvotes: 3

Related Questions