femibyte
femibyte

Reputation: 3507

Producer-consumer processing pattern for Kafka processing

I'm implementing a streaming pipeline that resembles the illustration below:

*K-topic1* ---> processor1 ---> *K-topic2* ---> processor2 --> 
*K-topic3* ---> processor3 --> *K-topic4*

The K-topic components represent Kafka topics and the processor components code (Python/Java).

For the processor component, the intention is to read/consume data from the topic, perform some processing/ETL on it, and persist the results to the next topic in the chain as well as persistent store such as S3.

I have a question regarding the design approach.

The way I see it, each processor component should encapsulate both consumer and producer functionality.

Would the best approach be to have a Processor module/class that could contain KafkaConsumer and KafkaProducer classes ? To date, most examples I've seen have separate consumer and producer components which are run separately and would entail running double the number of components as opposed to encapsulating producers & consumers within each Processor object.

Any suggestions/references are welcome.

This question is different from

Designing a component both producer and consumer in Kafka

as that question specifically mentions using Samza which is not the case here.

Upvotes: 0

Views: 504

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 192023

the intention is to read/consume data from the topic, perform some processing/ETL on it, and persist the results to the next topic in the chain

This is exactly the strength of Kafka Streams and/or KSQL. You could use the Processor API, but from what you describe, I think you'll only need the Streams DSL API

persist the results to the next topic in the chain as well as persistent store such as S3.

From the above topic, you can use a Kafka Connect Sink for getting the topic data into these other external systems. There is no need to write a consumer to do this for you.

Upvotes: 1

Related Questions