Reputation: 11
I have an Elasticsearch cluster running on EC2 instances and I want to automate snapshots to backup daily.
I've read through the Snapshot and Restore guide and I have the PUT command that'll make a snapshot. From my research I've seen a few ways to automate the backups. One suggestion I found was to use AWS managed Elasticsearch. Unfortunately switching to managed Elasticsearch will not work due to other constraints we have.
The first approach I tried was to set up a cron job on one of the nodes that would make the appropriate REST call. However I realized that if the node running the command were to go down, then the backups wouldn't run.
The next approach I though to use was AWS Datapipeline. The issue is that there doesn't seem to be a way to send REST calls from Datapipeline - I could run an EC2 shell command, but then I'd run into the same problem as before.
The other approach I've thought of is using a CloudWatch event to schedule a lambda function that'll make the REST call. It seems like that might work best, but it also seems overly complicated for automating backups.
Is there a way to automate backups from within Elasticsearch? And if not is there a simpler way of doing this with AWS services?
Upvotes: 0
Views: 1067
Reputation: 11
For an Elasticsearch cluster running on versions before 7.3, using a CloudWatch event to schedule a lambda function that'll make the REST call to do a snapshot is a resilient way to schedule backups as it avoids dependency on a single node running a cron job while also performing a reliable Elasticsearch backup (see the comments on the question for details).
A new feature in Elasticsearch 7.4 is snapshot lifecycle management which allows you to automatically back up Elasticsearch indices using the snapshots every day at a particular time.
Upvotes: 1