Reputation: 20470
I have large volumes of simple event records coming into a system and being published to a Kafka topic.
I have a streaming application that responds to events, and for each event, does a Postgresql query, gets/creates new ids, annotates the record, and publishes to an output topic.
I suspect doing postgresql operations for every single incoming event record is going to be a performance problem.
What are the better or alternative designs for this scenario?
Upvotes: 0
Views: 824
Reputation: 3443
This is an ideal case for a Kafka Streams streaming join, if you capture the contents of the Postgres table that you're querying in another Kafka topic you'll be able to lookup existing records and enrich them without having to call out to the database. If you need to insert new records to the table you can publish to another topic that is written to to the database.
Getting the data from Postgres can be done using Kafka Connect's JDBC source, or even better: using CDC from the Debezium project
Writing back to the table can be done with Kafka Connect JDBC sink.
Upvotes: 1
Reputation: 16066
You could use a short-ish window to accumulate records for n seconds and then batch process the emitted records. This will give you larger sets of records to process and you can use jdbc batching to enhance performance.
Upvotes: 2