mLC
mLC

Reputation: 693

Using Kafka and Spark Streaming for serving a web application

Let's assume I have a website with a form where user can past some values. Now I want to take these values, process them with Spark Streaming and return the result back to the user. Something like this:

enter image description here

Detailed setup does not really matter - the Spark Streaming can be doing some recommendation or prediction and could sit on top of Databricks; backend can be a Flask application...


My questions are:

  1. How to tell the Website Backend server that Spark Streaming processed the input data and output the results somewhere?
  2. Which pieces this pipeline misses? Some intermediate DB such as Redis/Mongo/SQL? Some message broker such as Kafka?

I can't get my head around the part where the Spark Streaming sends info back to the Website backend. If I send the result of Spark Streaming processing to database (Mongo, Redis, MySQL), filesystem (S3, Blob, HDFS) or message broker (Kafka, Kinesis), how to tell the Website backend about it?

Upvotes: 3

Views: 1653

Answers (1)

dbustosp
dbustosp

Reputation: 4478

You approach a solution based on a Event Driven Architecture. In my mind I would have the following components:

  1. Website backend. This service is the one connected to Apache Kafka for Producing and Consuming events. It will receive all the events from the UI and then will publish into Kafka topics the events triggered by the UI. You can create one Topic per each type of event. in order to have the events categorized. On the other hand, it will also have the role of Consumer (listener), which will be reading the messages from the different topics where the answers coming from Apache Spark will be published into Kafka.

  2. Apache Kafka. This is the component which is missing in the picture right know. It will play the role of passing messages to the different components which are subscribed to the topics. Make sure you have all the events categorized in different Topics.

  3. Spark Streaming. This component will be listening to Kafka for some events. Depending on what event you are getting, you will probably want to process the event differently. Once you process the event with Apache Spark, you will send the output to Apache Kafka.

Essentially, using Apache Kafka for event driven architecture would be good based on what your need. If you want to go deeper in services architecture with Apache Kafka, please check this out.

If you want add one more level of storage as Casandra to store the results of the prediction is really up to you, in my opinion will be a good idea, in that way you do not need to trigger Spark jobs for events which were already processed previously.

Upvotes: 3

Related Questions