Arnaud Le Blanc
Arnaud Le Blanc

Reputation: 99921

Kafka streams to build materialised views

I'm trying to produce some kind of materialized view from a stream of database updates (provided by e.g. the DBMS's transaction log, with the help of e.g. maxwell-daemon). The view is materialized as a Kafka compacted topic.

The view is a simple join and could be expressed as a query like this:

SELECT u.email user_email, t.title todo_title, t.state todo_state
FROM   User u
JOIN   Todo t
ON     t.user_id = u.id

I want the view to be updated every time User or Todo change (a message to be published on the view's kafka topic).

With Kafka Streams it seems to be possible to achieve that by doing this:

However, I'm not sure of a few things:

Upvotes: 3

Views: 1842

Answers (1)

Matthias J. Sax
Matthias J. Sax

Reputation: 62350

  • Is that even possible ?

Yes. The pattern you describe will compute what you want out-of-the-box.

  • Will this maintain original ordering of events ? e.g. if User is changed, then Todo is changed, am I guaranteed to see these changes in this order in the result of the join ?

Streams will process data according to timestamps (ie, records with smaller timestamps first). Thus, in general this will work as expected. However, there is no strict guarantee because in stream processing it's more important to make progress all the time (and don't block). Thus, Streams only applies a "best effort approach" with regard to processing records in timestamp order. For example, if one changelog does not provide any data, Streams will just keep going only processing data from the other changelog (and not block). This might lead to "out of order" processing with regard to timestamps from different partitions/topics.

  • How to handle transactions ? e.g. multiple database changes might be part of the same transaction. How to make sure that both KTables are updates atomically, and that all join results show only fully-applied transactions ?

That's not possible at the moment. Each update will be processed individually and you will see each intermediate (ie, not committed) result. However, Kafka will introduce "transactional processing" in the future that will enable to handle transactions. (see https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging and https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics)

Upvotes: 2

Related Questions