Abhishek Ramachandran
Abhishek Ramachandran

Reputation: 1170

Performing external operations after Elasticsearch indexing

I'm currently indexing webpages to elasticsearch. The indexing are done through java (Spring) and also through Apache Nutch.

I met with a situation that, I have to call an external API just after indexing or updating a document in elasticsearch. The API processes a field value in the index and store the processed result in the same index in another field. I tried the API call just before indexing and it affects indexing performance (takes too much time). I have to call the external API without affecting indexing or updating elasticsearch document.

Looking for some ideas.

I'm using elasticsearch version 5.6.3.

Upvotes: 1

Views: 515

Answers (2)

Ram Dwivedi
Ram Dwivedi

Reputation: 470

In my case, we had used logstash-kafka-logstash to write to ES. At the consumer end of Kafka, we invoked external API to compute new field, updated that in a POJO and wrote to ES. It has been running pretty well.

Note: you may also want to check if data computation process via external API can be improved.

Upvotes: 0

Jorge Luis
Jorge Luis

Reputation: 3253

At the moment ES doesn't support a "notification system" similar to the one that you need (https://discuss.elastic.co/t/notifications-from-elasticsearch-when-documents-are-added/5106/31) this is impractical in most cases due to the distributed nature of ES.

I think that the easier approach would be to push into Kafka/RabbitMQ (a queue) and you could have your ES indexer as a worker in this queue, and then this worker would be the ideal place to send a message to a different queue indicating that the document X is ready for enrichment (add more metadata). And in this case, you don't have to worry about slowing down the indexing speed of your system (you can add more ES indexers). You also don't need to query ES constantly to enrich your documents because you could send the field (or fields) that are needed along with the ES id to the enrichment workers, and they would update that document directly after the call to the external API). Keep in mind that perhaps part of this could be wrapped in a custom ES plugin.

The advantage of this is that you could scale both places (ES indexer/metadata enricher) separately.

Other option could be having some external module that queries ES for a chunk of documents that still haven't been enriched with the external content, and then you could call the external API and then update the document back to ES.

Upvotes: 0

Related Questions