clay
clay

Reputation: 20470

Kafka Streaming Database Query Architecture?

I have large volumes of simple event records coming into a system and being published to a Kafka topic.

I have a streaming application that responds to events, and for each event, does a Postgresql query, gets/creates new ids, annotates the record, and publishes to an output topic.

I suspect doing postgresql operations for every single incoming event record is going to be a performance problem.

What are the better or alternative designs for this scenario?

Upvotes: 0

Views: 824

Answers (2)

Chris Matta
Chris Matta

Reputation: 3443

This is an ideal case for a Kafka Streams streaming join, if you capture the contents of the Postgres table that you're querying in another Kafka topic you'll be able to lookup existing records and enrich them without having to call out to the database. If you need to insert new records to the table you can publish to another topic that is written to to the database.

Getting the data from Postgres can be done using Kafka Connect's JDBC source, or even better: using CDC from the Debezium project

Writing back to the table can be done with Kafka Connect JDBC sink.

Upvotes: 1

Nicholas
Nicholas

Reputation: 16066

You could use a short-ish window to accumulate records for n seconds and then batch process the emitted records. This will give you larger sets of records to process and you can use jdbc batching to enhance performance.

Upvotes: 2

Related Questions