Reputation: 46419
I'm considering a daily script to do the following, in order to account for any situations where there was a problem with updates on the ES server (I don't yet have a high-availability setup and even so, it's still probably a good practice in a situation where data is being duplicated between DB and ES). Before putting this script together, I thought I'd check if I'm going about this the right way, and whether there are any libraries or techniques I should use.
The script will simply retrieve all IDs from the database and all IDs from ElasticSearch, where created_at < current_time
(a snapshot of the current time, since it's a moving target as the script runs). It will then add and remove to Elastic search based on the differences between these IDs sets.
Does this sound like a reasonable approach?
Upvotes: 2
Views: 2626
Reputation: 46419
To answer my question, this is not the best approach.
A simpler, if more resource-intensive, approach is to re-build the entire index periodically. Of course, this is difficult to do in production as it would cause minutes or hours of downtime, so the trick is to rebuild a new index and switch to using that. In ElasticSearch, you can't rename an index, but you can use aliases.
There's a discussion of the approach here and a rake task for Tire users here.
Upvotes: 3
Reputation: 312
Please have a look at jdbc-river plugin. This plugin is fairly stable and can be used to sync data between ES and database.
Upvotes: 0