Reputation: 371
I need the number of messages in a kafka topic stored. This is not concerned with whether any consumer has consumed the messages or not.
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092,localhost:9093,localhost:9094 --topic test-topic
The above gives the offset number for the topic?
Is the above equal to the number of messages currently stored in the kafka topic?
Upvotes: 0
Views: 11769
Reputation: 853
The above gives the offset number for the topic? Yes it gives the current max offset
Is the above equal to the number of messages currently stored in the kafka topic? No, it's not the number of messages in the kafka as after retention period messages will be deleted from topic so offset != count of messages
To get number of messages in kafka
brokers="<broker1:port>"
topic=<topic-name>
sum_1=$(/usr/hdp/current/kafka-broker/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $brokers --topic $topic --time -1 | grep -e ':[[:digit:]]*:' | awk -F ":" '{sum += $3} END {print sum}')
sum_2=$(/usr/hdp/current/kafka-broker/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $brokers --topic $topic --time -2 | grep -e ':[[:digit:]]*:' | awk -F ":" '{sum += $3} END {print sum}')
echo "Number of records in topic ${topic}: "$((sum_1 - sum_2))
where option --time -1 => current max offset & --time -2 is current min offset
Upvotes: 0
Reputation: 638
Yes,this is equal to number of messages if earliest offset is equal to zero.If earliest offset is not equal to zero ,you need to calculate the difference and then sum per partition.
Upvotes: 0
Reputation: 7091
Not exactly. The numbers you got only refers to the current max offsets of all the topic partitions. Message count also depends on the partitions' beginning offsets for that topic.
You could run
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092,localhost:9093,localhost:9094 --topic test-topic --time -1
and
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092,localhost:9093,localhost:9094 --topic test-topic --time -2
respectively, and calculate the message count for each partition by subtracting beginningOffsets from endOffsets, then sum them up to get the total record count for that topic.
Upvotes: 1