Reputation: 4119
Here we go: I got quite complicated topology of various joins, aggregations, filters, maps, etc. By defaul the NUM_STREAM_THREADS_CONFIG
parameter equals to 1
and that's completely determenistic by definition - thus, partition's total ordering (that is guaranteed by Kafka itself) preserved.
Will total ordering be preserved once I set NUM_STREAM_THREADS_CONFIG
to 2
or more then that?
Does it depend upon special topology? I've checked the docs and went throught the threading model section, yet did not find an answer.
Upvotes: 1
Views: 285
Reputation: 62330
Data is always processed in per-partition offset order, even if you set num.stream.threads
to a larger value.
In Kafka Streams, sub-topologies are translated into tasks (based on input topic partitions) and tasks process records of their partitions in offset order. The number of tasks limits the number of threads you can keep busy (similar to the maximum number of consumers in a consumer group). If you configure more threads than available tasks, some threads just stay idle.
If a task processed data from multiple topics/partitions, there is no strict ordering guarantee for data of different partitions. Kafka Streams will take the record timestamps into account thought, and process records with smaller timestamp first.
Upvotes: 3