Reputation: 241
Say we have a Topic with 2 partitions and there are 'n' no of producers which are producing the data to this Topic. Now, Millions of the MessageRecords are being spread over 2 partitions.
Say, we have 2 threads (i.e. 2 separate Instances) powering to the Streams Processor. Now, In this setup, say Thread-1(i.e. Streaming Task-1) got Partition P-1 and say Thread-2(i.e. Streaming Task-2) got Partition P-2 for processing !!
ASK is :- Say, we want to know, how many MessageRecords
have been processed by Streaming-Task-1 so far OR say for 28th September, 2KK ?? How do I do that ?
And, even the bigger the question is : "Streaming-Task-1" would never know about the TOTAL count of MessageRecords
being processed, it shall only know about the count processed by itself !!
Can it ever know it know about the count processed by another Task-2 ??
Upvotes: 0
Views: 949
Reputation: 226
There are several ways to accomplish what you are asking. If you are using the DSL I suggest you take a look at the word count example (https://docs.confluent.io/current/streams/quickstart.html). With a map operation you can make all the counts you want relatively simply.
If you are not using the dsl you can still do the same with a couple processors and state stores.
Upvotes: 1