Weso
Weso

Reputation: 408

How to implement total count over multiple partitions in Kafka Streams

Let's say we have an inData Topic with 5 Partitions, that contains contract data and the contractId as key. I have 3 instances of a Kafka streams application, which counts the number of contracts.

Now I want to implement a total count of contracts in my Kafka streams application. Now I read that each stream application is assigned to only one partition. That means, that each instance of the Kafka streams application has only the count of each partition?

How can be a total count of contracts implemented? Do I need an intermediate topic with only one partition? Can it be achieved using a globalTable?

Upvotes: 0

Views: 919

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62285

Using a GlobalKTable or global state store wouldn't work (at least not directly), because both can only store unmodified data from a topic, however, you want to do some processing (ie, counting).

If you want to count all unique contactId you should first load the data into a KTable (via builder.table()) and afterward do a groupBy().count() -- in the groupBy() you map all record to the same new key. Because all records are mapped to the same key they will be repartitioned to the same topic partition and thus you get a global count.

Upvotes: 1

Related Questions