Reputation: 2699
We are using Prometheus and Grafana for monitoring our Kafka cluster.
In our application, we use Kafka streams and there is a chance that Kafka stream getting stopped due to exception. We are logging the event setUnCaughtExceptionHandler
but, we also need some kind of alerting when the stream stops.
What we currently have is, jmx_exporter running as a agent and exposes Kafka metrics through an endpoint and prometheus fetches the metrics from the endpoint.
We don't see any kind of metrics which gives the count of active consumers per topic. Are we missing something? Any suggestions on how to get the number of active consumers and send alerts when the consumer stops.
Upvotes: 3
Views: 2814
Reputation: 3955
we had similar needs and added Kafka Consumer Lag per partition into Grafana, and also added alerts if lag is more than specified threshold (threshold should be different per each topic, depending on load, e.g. for some topics it could be 10, and for highly loaded - 100000). so if you have more that e.g. 1000 unprocessed messages, you will get alert.
you could add state listener for each kafka stream and in case stream is in error state, log error or send email:
kafkaStream.setStateListener((newState, oldState) -> {
log.info("Kafka stream state changed [{}] >>>>> [{}]", oldState, newState);
if (newState == KafkaStreams.State.ERROR || newState == KafkaStreams.State.PENDING_SHUTDOWN) {
log.error("Kafka Stream is in [{}] state. Application should be restarted", newState);
}
});
also you could add health check indicator (e.g. via REST endpoint or via spring-boot
HealthIndicator
) that provides info whether stream is running or not:
KafkaStreams.State streamState = kafkaStream.state();
state.isRunning();
I also haven't found any kafka streams metrics which provide info about active consumers or available connected partitions, but as for me it would be nice if kafka streams provide such data (and hope it will be available in future releases).
Upvotes: 3