Markus Ratzer
Markus Ratzer

Reputation: 1352

Concurrency of Kafka streams topology with multiple output topics

Given a Kafka streams topology which publishes messages to two different topics, are there any guarantees in which order various steps will be executed in these two branches or are those branches separated completely and executed in parallel?

Example

    KStream<..., ...> filteredStream = builder.stream("input-topic", ...).filter(...)...;

    filteredStream.mapValues(this::mapOne).to("output-topic-one", ...);
    filteredStream.flatMap(this::mapTwo).to("output-topic-two", ...);

In this example, will mapOne executed and publishing to output-topic-one be done before mapTwo is even getting called or messages are published to output-topic-two? In other words, is there a guarantee that mapOne will be finished before messages are published to output-topic-two?

Topology Visualization

When looking at the visualization of the topology description (see at the bottom; made with https://zz85.github.io/kafka-streams-viz/) you can see the two branches. But you can also see these numbers in each bubble which might also indicate that there is an order of execution (1-4, then 5-6-7, then 8-9).

kafka streams topology

Topology Description

Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-0000000000 (topics: [input-topic])
      --> KSTREAM-FILTER-0000000001
    Processor: KSTREAM-FILTER-0000000001 (stores: [])
      --> KSTREAM-FILTER-0000000002
      <-- KSTREAM-SOURCE-0000000000
    Processor: KSTREAM-FILTER-0000000002 (stores: [])
      --> KSTREAM-MAP-0000000003
      <-- KSTREAM-FILTER-0000000001
    Processor: KSTREAM-MAP-0000000003 (stores: [])
      --> KSTREAM-FILTER-0000000004
      <-- KSTREAM-FILTER-0000000002
    Processor: KSTREAM-FILTER-0000000004 (stores: [])
      --> KSTREAM-MAPVALUES-0000000005, KSTREAM-FLATMAP-0000000008
      <-- KSTREAM-MAP-0000000003
    Processor: KSTREAM-MAPVALUES-0000000005 (stores: [])
      --> KSTREAM-FILTER-0000000006
      <-- KSTREAM-FILTER-0000000004
    Processor: KSTREAM-FILTER-0000000006 (stores: [])
      --> KSTREAM-SINK-0000000007
      <-- KSTREAM-MAPVALUES-0000000005
    Processor: KSTREAM-FLATMAP-0000000008 (stores: [])
      --> KSTREAM-SINK-0000000009
      <-- KSTREAM-FILTER-0000000004
    Sink: KSTREAM-SINK-0000000007 (topic: output-topic-one)
      <-- KSTREAM-FILTER-0000000006
    Sink: KSTREAM-SINK-0000000009 (topic: output-topic-two)
      <-- KSTREAM-FLATMAP-0000000008

Upvotes: 5

Views: 1002

Answers (1)

nipuna
nipuna

Reputation: 4125

Kafka streams always guarantee the Topology order. It is always passing the message in a topology, that topology has edges and nodes. Those edges and nodes added to the topology as you define it in the application.

In your case filtered stream go through the map values branch in the topology until that path is end (in your case sink -> topic one).

Then it continue with flat map branch. until the sink to topic two.

It is ordered correctly with that IDs.

0000000004 -> 0000000005 -> 0000000006 -> 0000000007

0000000004 -> 0000000008 -> 0000000009

For more information go through the Kafka source code internal topology builder

And refer this

Upvotes: 1

Related Questions