Nathan English
Nathan English

Reputation: 784

See size of Kafka Topics in Bytes

For Metrics we meed to see the total size of a Kafka Topic in bytes across all partitions and brokers.

I have been searching for quite a while on how to do this and I haven't worked out if this is possible and how to do it.

We are on V0.82 of Kafka.

Upvotes: 36

Views: 81533

Answers (9)

SHUBHAM VERMA
SHUBHAM VERMA

Reputation: 1

bin/kafka-log-dirs.sh \
 --bootstrap-server localhost:9092 \
 --topic-list event \
 --describe \
  | grep -oP '(?<=size":)\d+'  \
  | awk '{ sum += $1 } END { printf "%.2f\n", sum / (1024^3) }'

this solution by MSE worked , just tweaked a litle to show data in GBs

Upvotes: 0

kev
kev

Reputation: 161864

With this command, you will get a list of topic details:

kafka-log-dirs.sh --bootstrap-server 127.0.0.1:9092 --describe |
  grep '^{' |
    jq -c '.brokers[].logDirs[].partitions | map(.topic=(.partition|sub("-\\d+$";""))) | group_by(.topic)[] | {topic:.[0].topic, partitions:length, size:map(.size)|add}'
{"topic":"topic1","partitions":1,"size":1234}
{"topic":"topic2","partitions":2,"size":5678}
{"topic":"topic3","partitions":3,"size":0}

Upvotes: 6

Sourabh Mokhasi
Sourabh Mokhasi

Reputation: 159

If you wanted size per broker instead of total topic size, I created this query to help with that

kafka-log-dirs.sh --describe --bootstrap-server localhost:9092 --topic-list ${topic_name} --describe |  grep '^{' | jq '[.brokers[] | {broker:.broker, size:[.logDirs[].partitions[].size] | add}]' | less

Returns topic size per broker by summing up individual partition sizes. Useful when debugging issues with uneven partition distribution/hot partitions.

Sample output:

  {
    "broker": 7031,
    "size": 182197855891
  },
  {
    "broker": 6066,
    "size": 182357034551
  },
  {
    "broker": 6052,
    "size": 184447693788
  },

Upvotes: 2

Renato Mefi
Renato Mefi

Reputation: 2171

For people looking to have the output in readable format and a list for all topics, here it is:

bin/kafka-topics.sh --bootstrap-server 127.0.0.1:9092 --list \
  | xargs -I{} sh -c \ 
  "echo -n '{} -> ' && bin/kafka-log-dirs.sh --bootstrap-server 127.0.0.1:9092 --topic-list {} --describe | grep '^{'   | jq '[ ..|.size? | numbers ] | add' | numfmt --to iec --format '%8.4f'" \
  | tee /tmp/topics-by-size.list

This will:

List all topics in Kafka
Pass through `xargs` that will execute a command per topic
Get all logs sizes per topic
  sum each of the logs
  pass through `numfmt` to make it human readable
save to a file while printing to stdout

I hope this helps people who wanted a copy and paste command.

Upvotes: 5

Scott
Scott

Reputation: 1688

If you are running kafka in a docker container (wurstmeister/kafka) and you are getting

 Error: JMX connector server communication error: service:jmx:rmi ...
 sun.management.AgentConfigurationError: java.rmi.server.ExportException: Port already in use: 6099; nested exception is:
   java.net.BindException: Address in use (Bind failed)

You need to unset the JMX_PORT before you run the shell script.

(unset JMX_PORT; ./kafka-log-dirs.sh \ 
      --bootstrap-server 127.0.0.1:9092 --topic-list test --describe)

Upvotes: 1

Martbob
Martbob

Reputation: 541

You can see the partition size using the script /bin/kafka-log-dirs.sh

/bin/kafka-log-dirs.sh --describe --bootstrap-server <KafakBrokerHost>:<KafakBrokerPort> --topic-list <YourTopic>

Upvotes: 52

MSE
MSE

Reputation: 695

Another way of doing the same with regular expression and awk (in case you dont have jq installed) is:

$ bin/kafka-log-dirs.sh \
  --bootstrap-server 127.0.0.1:9092 \
  --topic-list test \
  --describe \
  | grep -oP '(?<=size":)\d+'  \
  | awk '{ sum += $1 } END { print sum }'

This returns the size (in bytes) of the topic test including its replications. In case you have a replication factor greater than 1 and you want the size of the unique topic message, divide the value you get with the replication factor.

Upvotes: 11

Cameron Kerr
Cameron Kerr

Reputation: 1875

As Martbob very helpfully mentioned, you can do this using kafka-log-dirs. This produces JSON output (on one of the lines). So I can use the ever-so-useful jq tool to pull out the 'size' fields (some are null), select only the ones that are numbers, group them into an array, and then add them together.

kafka-log-dirs \
    --bootstrap-server 127.0.0.1:9092 \
    --topic-list 'topic_of_interest' \
    --describe \
  | grep '^{' \
  | jq '[ ..|.size? | numbers ] | add'

Example output: 67704

I haven't verified if the output makes sense, so you should check that yourself.

Upvotes: 27

Krystian
Krystian

Reputation: 2290

Maybe that requires some extra work to install and configure, but KafkaHQ has information about topic and partitions size given in messages and bytes. Here link to: KafkaHQ

That is still the simplest option that I found to get that informations.

Upvotes: 2

Related Questions