Reputation: 10469
Im working on a POC converting an existing Flink application / topology to use KafkaStreams. My question is about deployment.
Specifically - in Flink one adds "Worker Nodes" to the flink installation and then adds more parallelization to the topology to keep up with increasing data rates.
How does one increase KStreams capacity as the data rate increases? Does KStreams handle this automatically? Do I launch more processes (ala Micro-services)?
Or am I missing the big picture here?
Upvotes: 3
Views: 165
Reputation: 15057
Do I launch more processes (ala Micro-services)?
The short answer is yes:
See the Kafka Streams documentation at http://docs.confluent.io/3.0.0/streams/developer-guide.html#elastic-scaling-of-your-application for further details (unfortunately the Apache Kafka docs on Kafka Streams don't have these details yet).
Or am I missing the big picture here?
The big picture is that the picture is actually nice and small. :-)
So let me add the following, because I feel that many users are confused by the complexity of other, related technologies and then don't really expect that you can do stream processing (including its deployment) in a much simpler way, like what you can do with Kafka Streams:
A Kafka Streams application is a normal, plain old Java application that happens to use the Kafka Streams library. A key differentiator to existing stream processing technologies is that, by using the Kafka Streams library, your application becomes scalable, elastic, fault-tolerant, and so on, without requiring a special "processing cluster" to add machines to, like you'd do for Flink, Spark, Storm, etc. The Kafka Streams deployment model is much simpler and easier: just start or stop additional instances of your application (i.e. literally the same code). This works with basically any deployment related tool, including but not limited to Puppet, Ansible, Docker, Mesos, YARN. You can even do that manually by running java ... YourApp
.
Upvotes: 6