Elastic Search using river-jdbc to sync data with remote mysql server

Question

Me and my team wants to use elastic-search on our project, however we have a requirement we dont want to use local instance of mysql for each node. We want to use a remote server of mysql data to store data that elastic search services are querying.

So the idea is each time a new item is adding on a ES server local is not add to a local instance but on a remote server of mysql (we think on amazon RDS). And for search query for any index we want the ES node query the remote database (on RDS instance).

We try to use river-jbdc with two flavour (river for pulling data) and feeder (for putting data on RDS instance).But we not able to make this working with river-jdbc .

Anyone try something similar? Or can anyone linking to one blog where this was made?

I appreciate any help

Thanks in advance

Jasper Huzen · Accepted Answer

We use a simular approach. We use an Oracle database as primary datastore.

We use PLSQL to flatten/convert data. For initial load we add data(records) to a "oneshot" table. Updates of the data will be flatten/converted and result in records in an "update" table. The oneshot and update table will be mapped to a single index in Elasticsearch.

Initial load of ES:

[Oracle DB]--->flatten data (pl sql)-->[records to animal_oneshot_river table, records to user_oneshot_river table]

The data will be pulled by the river to for example http://localhost/9200/zoo/animal and http://localhost/9200/zoo/user)

Updates

[Software]---->Change data--->[Oracle DB]--->flatten data (pl sql)-->[records to animal_update_river table, records to user_update_river table]

The update tables also contains a type of change (insert, update or delete).

The river wil poll the update_river tables for updates and mutates the data in Elasticsearch (we use a pull). The records will be deleted after processing by the river.

Data changes to Elasticsearch won't be send to Oracle. All changes on the primary datastore will be done by our own bussiness logic software.

We also write data to _spare tables (animal_oneshot_river_spare) because that makes it possible to reload the Elasticsearch without downtime and without synchronisation issues (we switch aliasses after reloading Elasticsearch).

Elastic Search using river-jdbc to sync data with remote mysql server

Answers (1)

Related Questions