Reputation: 347
I want elasticsearch data backup in different physical location. I have tried to put all elasticsearch nodes into a same cluster at first, but when program query or update elasticsearch, large data will transfer on internet. It will cause a lot of money for network traffic and there is a network delay.
Is there any easy way to sync data between two elasticsearch clusters? so that I can only sync the changed data on the internet.
PS: I don't so care about data sync delay, less then 1 min is acceptable
Upvotes: 2
Views: 13553
Reputation: 201
In case if you are running the latest version of Elasticsearch (5.0 or 5.2+), you need to have or add date
field updatedAt
or similar name and then on destination cluster side run cron
every 1 minute which will run Reindex API
query like this:
POST _reindex
{
"source": {
"remote": {
"host": "http://sourcehost:9200",
"username": "user",
"password": "pass"
},
"index": "source",
"query": {
"range": {
"updatedAt": {
"gte": "2015-01-01 00:00:00"
}
}
},
"dest": {
"index": "dest"
}
}
More information on Reindex API you can get here - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
In case if you are using older Elasticsearch (<5.0), then you can use tool elasticdump
(https://github.com/taskrabbit/elasticsearch-dump) to transfer data using similar approach with updatedAt
field.
Upvotes: 5