Chandan Bhattad
Chandan Bhattad

Reputation: 371

Kafka Number of messages in a topic

I need the number of messages in a kafka topic stored. This is not concerned with whether any consumer has consumed the messages or not.

kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092,localhost:9093,localhost:9094 --topic test-topic

The above gives the offset number for the topic?

Is the above equal to the number of messages currently stored in the kafka topic?

Upvotes: 0

Views: 11769

Answers (3)

spats
spats

Reputation: 853

The above gives the offset number for the topic? Yes it gives the current max offset

Is the above equal to the number of messages currently stored in the kafka topic? No, it's not the number of messages in the kafka as after retention period messages will be deleted from topic so offset != count of messages

To get number of messages in kafka

    brokers="<broker1:port>"
topic=<topic-name>
sum_1=$(/usr/hdp/current/kafka-broker/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $brokers --topic $topic --time -1 | grep -e ':[[:digit:]]*:' | awk -F  ":" '{sum += $3} END {print sum}')
sum_2=$(/usr/hdp/current/kafka-broker/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $brokers --topic $topic --time -2 | grep -e ':[[:digit:]]*:' | awk -F  ":" '{sum += $3} END {print sum}')
echo "Number of records in topic ${topic}: "$((sum_1 - sum_2))

where option --time -1 => current max offset & --time -2 is current min offset

Upvotes: 0

Vaibhav Gupta
Vaibhav Gupta

Reputation: 638

Yes,this is equal to number of messages if earliest offset is equal to zero.If earliest offset is not equal to zero ,you need to calculate the difference and then sum per partition.

Upvotes: 0

amethystic
amethystic

Reputation: 7091

Not exactly. The numbers you got only refers to the current max offsets of all the topic partitions. Message count also depends on the partitions' beginning offsets for that topic.

You could run

kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092,localhost:9093,localhost:9094 --topic test-topic --time -1

and

kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092,localhost:9093,localhost:9094 --topic test-topic --time -2

respectively, and calculate the message count for each partition by subtracting beginningOffsets from endOffsets, then sum them up to get the total record count for that topic.

Upvotes: 1

Related Questions