Reputation: 765
I have an elasticsearch cluster which has big amount of data. I want to extract all data from elasticsearch into Hadoop(Hive). I used Elasticsearch-Hadoop driver in order to extract data from elasticsearch by using Hive external table but it is too slow and fails the task always.
My first problem is to get all data from my existing elasticsearch cluster. Second problem is to duplicate all data which is streaming into elasticsearch on HDFS once in a day or an hour.
How can i achieve these?
Thanks in advance.
Upvotes: 1
Views: 143
Reputation: 306
You can use hadoop system as warehouse to store the data from where you can push the data to elasticsearch & vice versa.Try to use elasticsearch for only data you want to do analysis on present remove rest of the data from elasticsearch. So everytime you want to do analysis on different aspect pull that data from hadoop & use it.
Upvotes: 0