Juan Camilo Ramirez
Juan Camilo Ramirez

Reputation: 23

Is it right to use Apache Kafka between Logstash and Spark Streaming in order to get the log data to my business logic (defined in Spark)?

I'm using Logstash to send our log data to an Elasticsearch Service in AWS. Now I have some business logic defined in Spark Streaming that I want to apply to the log data in real-time, so I'm thinking about using Amazon SQS or Apache Kafka in the middle.

Is is right to use Kafka it in this scenario?

Thank you.

Upvotes: 2

Views: 618

Answers (1)

fhussonnois
fhussonnois

Reputation: 1727

The answer depends if you would like to couple your solution to an amazon product ? But yes kafka suits well for this usage.

Actually, Kafka is now used in place of Redis into the ELK stack. In addition, Spark Streaming relies strongly on Kafka to be able to replay messages in case of failures.

That depends of your business logic but if you are only using Spark Streaming to filter and transform your data before inserting into Elasticsearch you should have a look to KafkaStreams.

KafkaStreams provides an elegant DSL (à la Spark) to manipulate your kafka messages (transformations, filters, aggregations) without the requirement to deploy a cluster of master/worker nodes.

Upvotes: 3

Related Questions