Reputation: 6246
I want to import some tables from a postgres database into Elastic search and also hold the tables in sync with the data in elastic search. I have looked at a course on udemy, and also talked with a colleague who has a lot of experience with this issue to see what the best way to do it is. I am surprised to hear from both of them, it seems like the best way to do it, is to write code in python, java or some other language that handles this import and sync it which brings me to my question. Is this actually the best way to handle this situation? It seems like there would be a library, plugin, or something that would handle the situation of importing data into elastic search and holding it in sync with an external database. What is the best way to handle this situation?
Upvotes: 6
Views: 6489
Reputation: 86
There is a more recent tool called "abc", developped by appbase.io It's performance is uncomparable with logstash: - abc is based on go - logstash is jruby
Anybody who's ever used logstash knows that it takes at least 20 seconds just to start.
The same basic table import task from postgresql to elasticsearch takes ~1 min on logstash, and 5 seconds with abc
Pros:
Cons:
Upvotes: 3
Reputation: 17745
It depends on your use case. A common practice is to handle this on the application layer. Basically what you do is to replicate the actions of one db to the other. So for example if you save one entry in postgres you do the same in elasticsearch.
If you do this however you'll have to have a queuing system in place. Either the queue is integrated on your application layer, e.g. if the save in elasticsearch fails then you can replay the operation. Moreover on your queuing system you'll implement a throttling mechanism in order to not overwhelm elasticsearch. Another approach would be to send events to another app (e.g. logstash etc), so the throttling and persistence will be handled by that system and not your application.
Another approach would be this https://www.elastic.co/blog/logstash-jdbc-input-plugin. You use another system that "polls" your database and sends the changes to elasticsearch. In this case logstash is ideal since it's part of the ELK stack and it has a great integration. Check this too https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html
Another approach is to use the NOTIFY mechanism of postgres to send events to some queue that will handle saving the changes in elasticsearch.
Upvotes: 10
Reputation: 3877
As anything in life,best is subjective. Your colleague likes to write and maintain code to keep this in sync. There's nothing wrong with that.
I would say the best way would be to use some data pipeline. There's plethora of choices, really overwheleming, you can explore the various solutions which support Postgres and ElasticSearch. Here are options I'm familiar with.
Note that these are tools/platform for your solution, not the solution itself. YOU have to configure, customize and enhance them to fit your definition of in sync
Upvotes: 1