Reputation: 1352
Given a Kafka streams topology which publishes messages to two different topics, are there any guarantees in which order various steps will be executed in these two branches or are those branches separated completely and executed in parallel?
KStream<..., ...> filteredStream = builder.stream("input-topic", ...).filter(...)...;
filteredStream.mapValues(this::mapOne).to("output-topic-one", ...);
filteredStream.flatMap(this::mapTwo).to("output-topic-two", ...);
In this example, will mapOne
executed and publishing to output-topic-one be done before mapTwo
is even getting called or messages are published to output-topic-two? In other words, is there a guarantee that mapOne
will be finished before messages are published to output-topic-two?
When looking at the visualization of the topology description (see at the bottom; made with https://zz85.github.io/kafka-streams-viz/) you can see the two branches. But you can also see these numbers in each bubble which might also indicate that there is an order of execution (1-4, then 5-6-7, then 8-9).
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [input-topic])
--> KSTREAM-FILTER-0000000001
Processor: KSTREAM-FILTER-0000000001 (stores: [])
--> KSTREAM-FILTER-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-FILTER-0000000002 (stores: [])
--> KSTREAM-MAP-0000000003
<-- KSTREAM-FILTER-0000000001
Processor: KSTREAM-MAP-0000000003 (stores: [])
--> KSTREAM-FILTER-0000000004
<-- KSTREAM-FILTER-0000000002
Processor: KSTREAM-FILTER-0000000004 (stores: [])
--> KSTREAM-MAPVALUES-0000000005, KSTREAM-FLATMAP-0000000008
<-- KSTREAM-MAP-0000000003
Processor: KSTREAM-MAPVALUES-0000000005 (stores: [])
--> KSTREAM-FILTER-0000000006
<-- KSTREAM-FILTER-0000000004
Processor: KSTREAM-FILTER-0000000006 (stores: [])
--> KSTREAM-SINK-0000000007
<-- KSTREAM-MAPVALUES-0000000005
Processor: KSTREAM-FLATMAP-0000000008 (stores: [])
--> KSTREAM-SINK-0000000009
<-- KSTREAM-FILTER-0000000004
Sink: KSTREAM-SINK-0000000007 (topic: output-topic-one)
<-- KSTREAM-FILTER-0000000006
Sink: KSTREAM-SINK-0000000009 (topic: output-topic-two)
<-- KSTREAM-FLATMAP-0000000008
Upvotes: 5
Views: 1002
Reputation: 4125
Kafka streams always guarantee the Topology order. It is always passing the message in a topology, that topology has edges and nodes. Those edges and nodes added to the topology as you define it in the application.
In your case filtered stream
go through the map values branch
in the topology until that path is end (in your case sink -> topic one).
Then it continue with flat map branch
. until the sink to topic two.
It is ordered correctly with that IDs.
0000000004
-> 0000000005
-> 0000000006
-> 0000000007
0000000004
-> 0000000008
-> 0000000009
For more information go through the Kafka source code internal topology builder
And refer this
Upvotes: 1