Apache Flink State Store vs Kafka Streams

As far as I know handles Kafka Streams its States localy in memory or on disc or in a Kafka topic because all the input date is from a partition, where all the messages are keyed by a defined value. Most of the time the computations can be done without knowing the state of other Processors. If so, you have another Streams instance whichs calculsates the result. Like in this picture:

Where exactly does Flink store its States? Can Flink also store the states locally or does it always publish them always to all instances (tasks)? Is it possible to configure Flink so that it stores the States in a Kafka Broker?

Upvotes: 4

Answers (2)

Kemal Atik

Reputation: 337

There is a distinction between Flink and Kafka Streams. Flink is cluster framework, your code is deployed and runs as job in Flink Cluster. Kafka streams is API that you embed in your standard java application. Stream processing logic runs inside the your application java process. They both can sink results to Kafka, key value store, database or external systems. Flink’s master node implements its own high availability mechanism based on ZooKeeper and ensures the availability interim states after the disaster. If you are using Kafka Streams once you managed to save your interim states to Kafka Cluster you will have the same HA features provided by Kafka Cluster.

Upvotes: 0

Matthias J. Sax

Reputation: 62285

Flink also uses local stores (that can be keyed), similar to Kafka Streams. However, it does not write state into Kafka topics.

For fault-tolerance, it takes so-called "distributed snapshots", that are stored in a configurable state backend (eg, HDFS).

Check out the docs for more details:

Upvotes: 5

Apache Flink State Store vs Kafka Streams

Answers (2)

Related Questions