Reputation: 71
My requirement is I have log files that I need to process, also I would like to enrich the log information with some data which I have in postgres db.
Step 1. I plan to feed data from above two sources (log file and database) to kafka topics, using logstash Step 2. I plan to use kafka stream to join data on different kafka topics and push them to elastic search via API calls.
My doubt is about step 2, Is kafka stream is the way to go ? or can I use Apache spark which I believe can be used for same. Any help on this is appreciated.
Upvotes: 0
Views: 429
Reputation: 32110
Step 1. I plan to feed data from above two sources (log file and database) to kafka topics, using logstash
If you're already using Apache Kafka, then note that you can use Kafka Connect for integrating systems, including databases, into Kafka. For information on integrating databases, see this article.
Step 2. I plan to use kafka stream to join data on different kafka topics and push them to elastic search via API calls. My doubt is about step 2, Is kafka stream is the way to go ? or can I use Apache spark which I believe can be used for same. Any help on this is appreciated.
Yes, Kafka Streams is a good fit for this. It can enrich events as they flow through a topic, using data from other topics. These topics can be sourced from any system, including log files, databases, etc. Here is example code of such join, and the documentation for it.
BTW you might want to also check out KSQL. KSQL is built on Kafka Streams so you get the same scalability and elasticity functionality, but with a SQL abstraction that you can run directly (no coding needed). For an example of using KSQL to enrich streams of data see this talk or this article
(Disclosure: I work for Confluent, who lead the open-source KSQL project)
Upvotes: 1