Utilitaire CCV
Utilitaire CCV

Reputation: 143

kafka consume an undefined number of topics

The topics are dynamically created, and there could be thousands of them. I need a way to detect when messages are produced so I can consume them. Moreover, I need to consume each topic independently so that I can then bulk a large number of messages into a database, each topic corresponding to a different table. So let's say I start consuming a topic, I would consume 1000 messages, bulk them in a database in one operation, then commit the reading in kafka. If I have 10 topics, I could use 10 consumers in parallel. The problem is if I end up with a large number of topics, and that most of them are idle (empty), I need a way to be notified that some topics become suddenly active, so that I don't have to launch thousands of idle consumers that do nothing most of the time.

The only solution I thought so far is using a single signal topic in addition to the real topics, in which the producers would produce in addition to the real topic. But I was wondering if there was another solution. Like polling the meta-data in kafka, maybe. But for what I've seen, I would have to iterate through all the topics matching a regex, then check the offsets of the partitions for each. I don't think it's possible to do that efficiently, but maybe I'm wrong.

Upvotes: 1

Views: 143

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191681

You could track JMX metrics from the broker for incoming bytes per topic using Prometheus JMX Exporter, for example, then combine that with AlertManager to send some event/webhook upon some threshold of data to a consuming REST service, which would then start some consumers (maybe Kafka Connect tasks for a database?).

Or, like you said, use a signal topic since producer requests can be made to multiple topics at once.

If I have 10 topics, I could use 10 consumers in parallel

You can have more parallel consumers if any of those topics have multiple partitions

could be thousands of them

There's are reasonable limits on the number of topics a Kafka cluster can support, by the way, but it's upwards of hundreds of thousands, as of latest releases. Something to keep in mind, though.

launch thousands of idle consumers that do nothing most of the time.

You could also use solutions like AWS Lambda or Kubernetes KEDA to auto scale up/down based on topic data (lag)

Upvotes: 1

Related Questions