basilboli
basilboli

Reputation: 634

ElasticSearch incremental snapshot is ambiguous

Elasticsearch snapshot/restore doc states that the index snapshot process is incremental.

Could you please explain what does it mean and confirm that every snapshot is autonomous in terms of restoration?

Use case :

Let's say I have created repository and first snapshotA containing all indexes at the moment A.

Sometime later (for example one hour later) I create new snapshotB of all the indexes at the moment B that have changed since the moment A.

There are two questions :

  1. Does the size of snapshotB will be equal to the actual size of all indexes and contain all the data at the moment B or contain just the partial data : difference between snapshotA and snapshotB ?

  2. If the second, how does elasticseach calculate that difference ?

  3. If the second, can we safely delete snapshotA without loosing the data for the snapshotB ?

Thanks.

Upvotes: 2

Views: 1099

Answers (1)

Andrei Stefan
Andrei Stefan

Reputation: 52368

The snapshots are incremental at file level, not document level. Each shard is a Lucene index and each Lucene index is performing automatic segments merging in the background. These segments are the files that are considered for a snapshot.

If at time A your index has 5 segments and by the time B 3 of them have merged into a bigger one, the snapshot taken at time B will only add this new segment in the snapshots repository. And in the metadata of the snapshot it will record that it needs this file and the 2 other files that were already added when snapshot A was created.

If you use the normal DELETE snapshot API Elasticsearch will delete those files that are not needed by any other existent snapshot. In this example, ES will delete the 3 segments that were merged into the larger one. Any other option of deleting a snapshot is not recommended and could lead to data loss.

Upvotes: 2

Related Questions