Reputation: 211
which one is recommended to use : 1. Single kafka stream consuming from multiple topics 2. Different kafka streams consuming from different topics (I've used this one already with no issues encountered)
Is it possible to achieve #1 ? and if yes, what're the implications? and if I use 'EXACTLY_ONCE' settings, what kind of complexities it'll bring?
kafka version : 2.2.0-cp2
Upvotes: 3
Views: 12996
Reputation: 9357
Is it possible to achieve #1 (Single kafka stream consuming from multiple topics)
Yes, you can use StreamsBuilder#stream(Collection<String> topics)
If the data that you want to process is spread across multiple topics and that these multiple topics constitute one single source, then you can use this, but not if you want to process those topics in parallel.
It is like one consumer subscribing to all these topics which also means one thread for consuming all the topics. When you call poll()
it returns ConsumerRecords
from all the subscribed topics and not just one topic.
In Kafka streams, there is a term called Topology, which is basically a acyclic graph of sources, processors and sinks. A topology can contain sub-topologies.
Sub-topologies can then be executed as independent stream tasks through parallel threads (Reference)
Since each topology can have a source, which can be a topic, and if you want parallel processing of these topics, then you have to break-up your graph to sub-topologies.
If I use 'EXACTLY_ONCE' settings, what kind of complexities it'll bring?
When messages reach sink processor in a topology, then its source must be committed, where a source can be a single topic or collection of topics.
Multiple topics or one topic, we need to send offsets to the transaction from the producer, which is basically Map<TopicPartition, OffsetMetadata>
that should be committed when the messages are produced.
So, I think it should not introduce any complexities whether it is single topic having 10 partitions or 10 topics with 1 partition each, because offset is at the TopicPartition level and not at topic level.
Upvotes: 4