Reputation: 8903
I am trying to understand how Kafka Stream
work under the hood (to know it a little better), and came across confluent link, and it is really wonderful.
It says two terms viz: StreamThreads
and StreamTasks
.
I am not able to understand what exactly is StreamTasks
?
StreamThread
?StreamThreads
can have multiple StreamTasks
, so won't there be any data sharing and won't this thread run slower? How does a StreamThread
"run" multiple StreamTasks
?Any explanation in simple words would be of great help.
Upvotes: 1
Views: 325
Reputation: 62285
"Tasks" are a logical abstractions of work than can be done in parallel (ie, stuff that can be processed independent from each other). Kafka Streams basically creates a task for each input topic partition, because data in different partitions can processed independent from each other (it's a simplification, but holds if you have a single input topic; for joins it's a little bit different).
A StreamThread
is basically a JVM thread. Task are assigned to StreamsThread
for execution. In the current implementation, a StreamThread
basically loops over all tasks and processes some amount of input data for each task. In between, the StreamThread
(that is using a KafkaConsumer
) polls the broker for new data for all its assigned tasks.
Because tasks are independent from each other, you can run as many thread as there are tasks. For this case, each thread would execute only a single task.
Upvotes: 5