isbee
isbee

Reputation: 161

Kafka Streams KTable-KTable non key join performance on skewed tables

We have two table Merchant and Product, which have one-to-many relationship. Merchant is very skewed so one merchant have many products. We can join these two KTable by merchantId using non-key join.

When Merchant is updated, lots of events arrives on KTABLE-FK-JOIN-SUBSCRIPTION-RESPONSE and this is expected behavior. But the problem is consuming KTABLE-FK-JOIN-SUBSCRIPTION-RESPONSE events is very slow and have some kind of upper bound. More specifically, the speed of emitting responses is proportional to the merchants changes, while the speed of materializing non-key join results into the changelog is limited.

Since valueJoiner only performs very lightweight object allocation work, I can't understand why consuming KTABLE-FK-JOIN-SUBSCRIPTION-RESPONSE is slow and limited to some bound. How can we increase throughput in the current situation?

Upvotes: 1

Views: 421

Answers (1)

Andras Hatvani
Andras Hatvani

Reputation: 4491

My solution was the strict separation of deployment and operation of the software, so that different runtime properties could be set among others for Kafka. The most important setting however was acks: 1 instead of the default acks: all during deployment resulting in an improvement of an order of magnitude. Although only three brokers are part of the cluster it was a game-changer; I couldn't even come near with any other tuning attempts. Thus, the deployments now only take 10 minutes instead of several hours.

Upvotes: 0

Related Questions