Reputation: 23
I'm using Logstash to send our log data to an Elasticsearch Service in AWS. Now I have some business logic defined in Spark Streaming that I want to apply to the log data in real-time, so I'm thinking about using Amazon SQS or Apache Kafka in the middle.
Is is right to use Kafka it in this scenario?
Thank you.
Upvotes: 2
Views: 618
Reputation: 1727
The answer depends if you would like to couple your solution to an amazon product ? But yes kafka suits well for this usage.
Actually, Kafka is now used in place of Redis into the ELK stack. In addition, Spark Streaming relies strongly on Kafka to be able to replay messages in case of failures.
That depends of your business logic but if you are only using Spark Streaming to filter and transform your data before inserting into Elasticsearch you should have a look to KafkaStreams.
KafkaStreams provides an elegant DSL (à la Spark) to manipulate your kafka messages (transformations, filters, aggregations) without the requirement to deploy a cluster of master/worker nodes.
Upvotes: 3