zhufeizzz
zhufeizzz

Reputation: 347

How to sync data between elasticsearch clusters?

I want elasticsearch data backup in different physical location. I have tried to put all elasticsearch nodes into a same cluster at first, but when program query or update elasticsearch, large data will transfer on internet. It will cause a lot of money for network traffic and there is a network delay.

Is there any easy way to sync data between two elasticsearch clusters? so that I can only sync the changed data on the internet.

PS: I don't so care about data sync delay, less then 1 min is acceptable

Upvotes: 2

Views: 13553

Answers (1)

Alexander Bocharov
Alexander Bocharov

Reputation: 201

In case if you are running the latest version of Elasticsearch (5.0 or 5.2+), you need to have or add date field updatedAt or similar name and then on destination cluster side run cron every 1 minute which will run Reindex API query like this:

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://sourcehost:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "source",
    "query": {
      "range": {
        "updatedAt": {
          "gte": "2015-01-01 00:00:00"
      }
    }
  },
  "dest": {
    "index": "dest"
  }
}

More information on Reindex API you can get here - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

In case if you are using older Elasticsearch (<5.0), then you can use tool elasticdump (https://github.com/taskrabbit/elasticsearch-dump) to transfer data using similar approach with updatedAt field.

Upvotes: 5

Related Questions