Mohamed Said Benmousa
Mohamed Said Benmousa

Reputation: 499

Generate data with apache kafka and receive it using spark streaming

I would like to know how can I do in the same program to generate random data using apache Kafka and receive it using spark streaming.

Let's show a use case:

I want to generate random data like this -> (A, B, [email protected]) while X seconds. And then I want to receive this data for processing it in real time (while I'm receiving it), and if the second parameter is B send an email to '[email protected]' with the following message: "The first parameter is A".

I know that I have to start a zookeeper server, then start a kafka broker, then create a topic, and then a producer for produce and send this data. For create the connection between kafka and streaming I need to use "createStream" function. But I don't know how to use a producer to send this data and then receive it with spark streaming for processing it. All this in the same program and using Java.

Any help? Thank you.

Upvotes: 0

Views: 1574

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62285

There will not be a single program, but a Kafka producer program and a Spark program. For both, there are couple of examples available online, eg:

To run this, you start Kafka (including ZK) and your Spark cluster. Afterwards, you start your Producer program that writes into Kafka and you Spark job that reads from Kafka (I guess the order to start Producer and Spark job should not matter).

Upvotes: 1

Related Questions