Kafka Streams application deployment - embedded vs application management frameworks

Question

I'm pretty new to Kafka Streams. Right now I'm trying to understand the basic principles of this system.

This is a quote from the following article https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/

You just use the library in your app, and start as many instances of the app as you like, and Kafka will partition up and balance the work over these instances.

Right now it is not clear to me how it works. Where will the business logic(computation tasks) of Kafka Streams be executed? It will be executed inside of my application, or it is just a client for Kafka cluster and this client will only prepare tasks that will be executed on Kafka cluster? If no, how to properly scale the computation power of my Kafka Streams application? Is it possible to execute inside Yarn or something like this? This way is it a good idea to implement the Kafka Streams application as an embedded component of the core application(let's say web application in my case) or it should be implemented as a separate service and deployed to Yarn/Mesos(if it is possible) separately from the main web application? Also, how to prepare Kafka Streams application to be ready deploy with Yarn/Mesos application management frameworks?

Matthias J. Sax · Accepted Answer

You stream processing code is running inside your applications -- it's not running in the Kafka cluster.

You can deploy anyway you like: Yarn/Mesos/kubernetes/WAR/Chef whatever. The idea is to embed it directly into your application to avoid setting up a processing cluster.

You don't need to prepare Kafka Streams for a deployment method -- it's completely agnostic to how it gets deployed. For Yarn/Mesos you would deploy it as any other Java application within the framework.

Kafka Streams application deployment - embedded vs application management frameworks

Answers (1)

Related Questions