Drissi Yazami
Drissi Yazami

Reputation: 37

spark-streamin kafka connect and the elk stack

I want to use kafka connect and spark streaming to insert in elasticsearch and then visualize using kibana for a BI use case, please can you help me out I dont where to start is there any project using these technologies which can help me unserstand the logic of the implementation it would be very helpful

Upvotes: 0

Views: 1829

Answers (1)

Maximilien Belinga
Maximilien Belinga

Reputation: 3186

What you're trying to build is some kind of big data pipeline. And there are many ways to do it. A possible architecture is: Logstash->Kafka->Spark->Elasticsearch.

Possible scenario

Basically Logstash forwards logs to Kafka, consumed by Spark Streaming. The scenario is to collect the new generated logs from server by Logstash, ship logs to Kafka, then processed by Spark streaming in near real-time and store them in Elasticsearch for further vizualization on Kibana.

Logstash

Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to multitude of outputs. There are plenty of input plugins for logstash out of there. You can use file input plugin for instance to get logs from a file. Also, There are many output plugins for logstash. But the one that should require your attention is the kafka output plugin.

Kafka

Apache Kafka is a distributed publish-subscribe messaging system designed to replace traditional message brokers. Kafka can be used for a number of purposes: Messaging, real time website activity tracking, monitoring operational metrics of distributed applications, log aggregation from numerous servers, event sourcing where state changes in a database are logged and ordered, commit logs where distributed systems sync data and restoring data from failed systems. In my use case, kafka is used for log aggregation.

Spark streaming

This guide should lead you to implement the first part of your streaming job. That means getting data in realtime from Kafka in a streaming fashion using spark. For the second part (Sending received data to Elasticsearch), you can use the Elasticsearch support for Apache Spark.

There is also a good example to link spark with Elasticsearch using spark streaming: https://github.com/pochi/spark-streaming-kafka-sample/blob/master/src/main/scala/pochi/spark/streaming/Elastic.scala. But you will probably need to update some things due to the technology changes.

Elasticsearch - Kibana

The last step is straightforward. You need to configure Elasticsearch and Kibana to communicate with each other. Then load your data by configuring an index pattern on Kibana, and do your vizualisation. For more informations about that, refer to the documentation online

Upvotes: 1

Related Questions