Reputation: 73
I have a topology using the processor api which updates a state store, configured with replication factor of 3, acks=ALL
Topologies:
Sub-topology: 0
Source: products-source (topics: [products])
--> products-processor
Processor: products-processor (stores: [products-store])
--> enriched-products-sink
<-- products-source
Sink: enriched-products-sink (topic: enriched.products)
<-- products-processor
My monitoring shows me very little lag for the source topic (< 100 records), however there is significant lag on the changelog topic backing the store, to the order of millions of records.
I'm trying to figure out the root cause of the lag on this changelog topic, as I'm not making any external requests in this processor. There are calls to rocksdb state stores, but these data stores are all local and should be fast in retrieving.
My question is what exactly is the consumer of this change log topic?
Upvotes: 3
Views: 1417
Reputation: 1418
The consumer of the changelog topics is the restore consumer. The restore consumer is a Kafka consumer that is build into Kafka Streams. In contrast to the main consumer that reads records from the source topic, the restore consumer is responsible for restoring the local state stores from the changelog topics in case the local state is not existent or out-of-date. Basically, it ensures that the local state stores recover after a failure. The second purpose of restore consumers is to keep stand-by tasks up-to-date.
Each stream thread in a Kafka Streams client has one restore consumer. The restore consumer is not a member of a consumer group and Kafka Streams assigns changelog topics manually to restore consumer. The offsets of restore consumers are not managed in the consumer offset topic __consumer_offsets
as the offsets of the main consumer but in a file in the state store directory of a Kafka Streams client.
Upvotes: 2