kafka stream make a local aggregation

Question

I am trying to make a local aggregation.

The input topic has records containing multiple elements and I am using flatmap to split the record into multiple records with another key (here element_id). This triggers a re-partition as I am applying a grouping for aggregation later in the stream process. Problem: there are way too many records in this repartition topic and the app cannot handle them (lag is increasing).

Here is a example of the incoming data

key: another ID

value:

{
  "cat_1": {
    "element_1" : 0,
    "element_2" : 1,
    "element_3" : 0
  },
  "cat_2": {
    "element_1" : 0,
    "element_2" : 1,
    "element_3" : 1
  }
}

And an example of the wanted aggregation result: key : element_2 value:

{
  "cat_1": 1,
  "cat_2": 1
}

So I would like to make a first "local aggregation" and stop splitting incoming records, meaning that I want to aggregate all elements locally (no re-partition) for example in a 30 seconds window, then produce result per element in a topic. A stream consuming this topic later aggregates at a higher level.

I am using Stream DSL, but I am not sure it is enough. I tried to use the process() and transform() methods that allow me to benefit from the Processor API, but I don't known how to properly produce some records in a punctuation, or put records in a stream.

How could I achieve that ? Thank you

kafka stream make a local aggregation

Answers (1)

Related Questions