JSBach
JSBach

Reputation: 4747

How to bulk insert in ElasticSearch? (Pentaho Kettle is not working)

I have an ETL process that I am implementing using Pentaho Kettle (Spoon). Everything is working fine, except that I can't insert the generated data into my ElasticSearch remote server. I tried using Kettle's component "Elastic Search Bulk Insert", but Kettle can't find my Elastic Search nodes (as it can be seen here) . Is there any reliable way to add a lot of information to my ES server? Solutions with kettle or independent scripts/plugins/etc are accepted, the only constraint is that The ETL process will run in a different machine from Elastic Search. Kettle has a custom Java script element that could also be used.

EDIT: I found out that Pentaho is using a very old version of elastic search (0.16.3), I am trying to find a way to update it. No luck until now...

Upvotes: 2

Views: 6475

Answers (5)

Victor de la Calle
Victor de la Calle

Reputation: 11

One common mistake in this context is copying elasticsearch-6.4.2.jar to \data-integration\lib. This is unnecessary and counterproductive.

Steps:

  1. Servers: localhost 9300
  2. Settings: cluster.name my_cluster_name // from elasticsearch.yml

  3. PDI 8.2 or 8.3 or 9.0

  4. Elasticsearch ver 6.4.2

Upvotes: 1

mermerkaya
mermerkaya

Reputation: 79

Current PDI(6.0.1) release support elasticsearch 1.5.4,

if someone needs to latest elasticsearch 2.2 working plugin for PDI 6.*

U can download the it, I tested it working with 2.2

https://drive.google.com/file/d/0B0hgGtBdLOBMbWtfVVFnTE1uVmM/view?usp=sharing

Upvotes: 0

jelongpark
jelongpark

Reputation: 164

First you should know your Elastic Search Server configuration. Open elasticsearch.yml file under your Elasticsearch server and copy IP Address, transport.tcp.port and cluster.name values.

Come back to your Kettle, open "ElasticSearch Bulk Insert" task. Add "culster.name" in the [Settings] tab, and IP addres and tcp.port in [Servers] tab. Then try "Test Connection". it should works.

Upvotes: 1

still_waiting
still_waiting

Reputation: 31

I changed the dependent jar from elasticsearch-0.16.3.jar to elasticsearch-1.6.0.jar (it also needs lucene-core-4.10.4.jar), copied 'ElasticSearchBulk' (with some help) as a new plugin or modify the source code, because some of the locations of the elasticsearch package have changed (removing the wrong package import, then adding the correct). Finally, it is working well with elasticsearch1.6.

Upvotes: 2

Otto
Otto

Reputation: 3294

elasticsearch is a RESTful search engine so i use the REST Client kettle step. All you have to do is to follow the rest standarts for insertion rows into your remote elasticsearch server. it works well.

Upvotes: 2

Related Questions