gmilan
gmilan

Reputation: 27

Kafka streams - Joining streams when events could happen far apart in time

I would like to ask for your advice on how I could put together a solution for the following problem with Kafka Streams.

An application has subjects and lessons, and it fires the following events:

LessonCreated      SubjectCreated     LessonAddedToSubject   LessonRemovedFromSubject
 +----------+     +--------------+     ---------------+         +--------------+
 | Id  Hours|     |      Id      |     |Lesson|Subject|         |Lesson|Subject|
 | ---+---- |     +--------------+     +--------------+         +--------------+
 | 25 | 20  |     |      1       |     |  25  |   1   |         |  25  |   1   |
 | 26 | 40  |     |      2       |     |  26  |   1   |         |  26  |   2   |
 | 27 | 10  |     |      3       |     |  26  |   2   |         +------+-------+
 +----+-----+     +--------------+     |  26  |   3   |         
                                       |  27  |   3   |         
                                       |  27  |   1   |         
                                       +------+-------+

I would like to implement a stream that would take these streams and join them into the following structure:

   LessonSubjectHours
 ---------------------+
 |Lesson|Subject|Hours|
 +--------------------+
 |  26  |  1    | 40  |
 |  26  |  3    | 40  |
 |  27  |  3    | 10  |
 |  27  |  1    | 10  |
 +--------------------+

I thought about doing some logic with join operations, but I believe it might not help since KStream-KStream joins appear to be forcibly time-windowed (if I understood correctly). This is due to the fact that lessonCreated, lessonAdded and lessonRemoved events could happen indefinitely far apart in time. Thus, I'm afraid windowed joins might lead to wrong results when one of those events happen too long after the last event containing the same key was emitted.

Doing a full lookup for a join shouldn't be a performance issue though, since these events shouldn't happen too often. But still, I have no clue on how to proceed, assuming it is possible to properly deal with this problem in Kafka Streams. So any advice would be appreciated.

Thank you in advance.

PS: It is still possible to change the events and the data they contain, in case it helps.

Upvotes: 0

Views: 833

Answers (2)

Cường Lê
Cường Lê

Reputation: 33

KTable with local state(RockDb is fast) would be the right choice.

Upvotes: 1

Matthias J. Sax
Matthias J. Sax

Reputation: 62320

It seems your data is basically tabular. Hence, I am wondering if reading the topics as KStream is actually the right approach and if you should process the data as KTable? For this case you can just so table-table joins.

Upvotes: 2

Related Questions