Kafka Streaming Database Query Architecture?

Question

I have large volumes of simple event records coming into a system and being published to a Kafka topic.

I have a streaming application that responds to events, and for each event, does a Postgresql query, gets/creates new ids, annotates the record, and publishes to an output topic.

I suspect doing postgresql operations for every single incoming event record is going to be a performance problem.

What are the better or alternative designs for this scenario?

Chris Matta · Accepted Answer

This is an ideal case for a Kafka Streams streaming join, if you capture the contents of the Postgres table that you're querying in another Kafka topic you'll be able to lookup existing records and enrich them without having to call out to the database. If you need to insert new records to the table you can publish to another topic that is written to to the database.

Getting the data from Postgres can be done using Kafka Connect's JDBC source, or even better: using CDC from the Debezium project

Writing back to the table can be done with Kafka Connect JDBC sink.

Kafka Streaming Database Query Architecture?

Answers (2)

Related Questions