ankit patel
ankit patel

Reputation: 1469

Kafka consumer test and reported metrics

I want to understand how kafka consumer test works and how to interpret some of numbers reported,

below is the test i ran and the output i got. My questions are

  1. values reported for rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec are 1593109326098, -1593108732333, -0.0003, -0.2800; can you explain how it can report such a high and negative numbers ? they dont make sense to me.
  2. Everything reported from Metric Name Value line is reported due to --print-metrics flag. What is difference between metrics reported by default and with this flag? how they are calculated and where can i read about what do they mean?
  3. No matter i scale total consumer running in parallel or scale network and io threads at broker, consumer-fetch-manager-metrics:fetch-latency-avg metrics remains almost same. can you explain this? with more consumers pulling data fetch latency should go higher; similarly for given consuming rate if i reduce io and network thread parameters at broker shouldnt latency scale higher?

here is the command i ran

[root@oak-clx17 kafka_2.12-2.5.0]# bin/kafka-consumer-perf-test.sh --topic topic_test8_cons_test1 --threads 1  --broker-list clx20:9092 --messages 500000000 --consumer.config config/consumer.properties --print-metrics

and results

    start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms,fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
    WARNING: Exiting before consuming the expected number of messages: timeout (10000 ms) exceeded. You can use the --timeout option to increase the timeout.
    2020-06-25 11:22:05:814, 2020-06-25 11:31:59:579, 435640.7686, 733.6922, 446096147, 751300.8463, 1593109326098, -1593108732333, -0.0003, -0.2800
     
    Metric Name                                                                                                                                    Value
    consumer-coordinator-metrics:assigned-partitions:{client-id=consumer-perf-consumer-25533-1}                                                  : 0.000
    consumer-coordinator-metrics:commit-latency-avg:{client-id=consumer-perf-consumer-25533-1}                                                   : 2.700
    consumer-coordinator-metrics:commit-latency-max:{client-id=consumer-perf-consumer-25533-1}                                                   : 4.000
    consumer-coordinator-metrics:commit-rate:{client-id=consumer-perf-consumer-25533-1}                                                          : 0.230
    consumer-coordinator-metrics:commit-total:{client-id=consumer-perf-consumer-25533-1}                                                         : 119.000
    consumer-coordinator-metrics:failed-rebalance-rate-per-hour:{client-id=consumer-perf-consumer-25533-1}                                       : 0.000
    consumer-coordinator-metrics:failed-rebalance-total:{client-id=consumer-perf-consumer-25533-1}                                               : 1.000
    consumer-coordinator-metrics:heartbeat-rate:{client-id=consumer-perf-consumer-25533-1}                                                       : 0.337
    consumer-coordinator-metrics:heartbeat-response-time-max:{client-id=consumer-perf-consumer-25533-1}                                          : 6.000
    consumer-coordinator-metrics:heartbeat-total:{client-id=consumer-perf-consumer-25533-1}                                                      : 197.000
    consumer-coordinator-metrics:join-rate:{client-id=consumer-perf-consumer-25533-1}                                                            : 0.000
    consumer-coordinator-metrics:join-time-avg:{client-id=consumer-perf-consumer-25533-1}                                                        : NaN
    consumer-coordinator-metrics:join-time-max:{client-id=consumer-perf-consumer-25533-1}                                                        : NaN
    consumer-coordinator-metrics:join-total:{client-id=consumer-perf-consumer-25533-1}                                                           : 1.000
    consumer-coordinator-metrics:last-heartbeat-seconds-ago:{client-id=consumer-perf-consumer-25533-1}                                           : 2.000
    consumer-coordinator-metrics:last-rebalance-seconds-ago:{client-id=consumer-perf-consumer-25533-1}                                           : 593.000
    consumer-coordinator-metrics:partition-assigned-latency-avg:{client-id=consumer-perf-consumer-25533-1}                                       : NaN
    consumer-coordinator-metrics:partition-assigned-latency-max:{client-id=consumer-perf-consumer-25533-1}                                       : NaN
    consumer-coordinator-metrics:partition-lost-latency-avg:{client-id=consumer-perf-consumer-25533-1}                                           : NaN
    consumer-coordinator-metrics:partition-lost-latency-max:{client-id=consumer-perf-consumer-25533-1}                                           : NaN
    consumer-coordinator-metrics:partition-revoked-latency-avg:{client-id=consumer-perf-consumer-25533-1}                                        : 0.000
    consumer-coordinator-metrics:partition-revoked-latency-max:{client-id=consumer-perf-consumer-25533-1}                                        : 0.000
    consumer-coordinator-metrics:rebalance-latency-avg:{client-id=consumer-perf-consumer-25533-1}                                                : NaN
    consumer-coordinator-metrics:rebalance-latency-max:{client-id=consumer-perf-consumer-25533-1}                                                : NaN
    consumer-coordinator-metrics:rebalance-latency-total:{client-id=consumer-perf-consumer-25533-1}                                              : 83.000
    consumer-coordinator-metrics:rebalance-rate-per-hour:{client-id=consumer-perf-consumer-25533-1}                                              : 0.000
    consumer-coordinator-metrics:rebalance-total:{client-id=consumer-perf-consumer-25533-1}                                                      : 1.000
    consumer-coordinator-metrics:sync-rate:{client-id=consumer-perf-consumer-25533-1}                                                            : 0.000
    consumer-coordinator-metrics:sync-time-avg:{client-id=consumer-perf-consumer-25533-1}                                                        : NaN
    consumer-coordinator-metrics:sync-time-max:{client-id=consumer-perf-consumer-25533-1}                                                        : NaN
    consumer-coordinator-metrics:sync-total:{client-id=consumer-perf-consumer-25533-1}                                                           : 1.000
    consumer-fetch-manager-metrics:bytes-consumed-rate:{client-id=consumer-perf-consumer-25533-1, topic=topic_test8_cons_test1}                  : 434828205.989
    consumer-fetch-manager-metrics:bytes-consumed-rate:{client-id=consumer-perf-consumer-25533-1}                                                : 434828205.989
    consumer-fetch-manager-metrics:bytes-consumed-total:{client-id=consumer-perf-consumer-25533-1, topic=topic_test8_cons_test1}                 : 460817319851.000
    consumer-fetch-manager-metrics:bytes-consumed-total:{client-id=consumer-perf-consumer-25533-1}                                               : 460817319851.000
    consumer-fetch-manager-metrics:fetch-latency-avg:{client-id=consumer-perf-consumer-25533-1}                                                  : 58.870
    consumer-fetch-manager-metrics:fetch-latency-max:{client-id=consumer-perf-consumer-25533-1}                                                  : 503.000
    consumer-fetch-manager-metrics:fetch-rate:{client-id=consumer-perf-consumer-25533-1}                                                         : 48.670
    consumer-fetch-manager-metrics:fetch-size-avg:{client-id=consumer-perf-consumer-25533-1, topic=topic_test8_cons_test1}                       : 9543108.526
    consumer-fetch-manager-metrics:fetch-size-avg:{client-id=consumer-perf-consumer-25533-1}                                                     : 9543108.526
    consumer-fetch-manager-metrics:fetch-size-max:{client-id=consumer-perf-consumer-25533-1, topic=topic_test8_cons_test1}                       : 11412584.000
    consumer-fetch-manager-metrics:fetch-size-max:{client-id=consumer-perf-consumer-25533-1}                                                     : 11412584.000
    consumer-fetch-manager-metrics:fetch-throttle-time-avg:{client-id=consumer-perf-consumer-25533-1}                                            : 0.000
    consumer-fetch-manager-metrics:fetch-throttle-time-max:{client-id=consumer-perf-consumer-25533-1}                                            : 0.000
    consumer-fetch-manager-metrics:fetch-total:{client-id=consumer-perf-consumer-25533-1}                                                        : 44889.000
    Exception in thread "main" java.util.IllegalFormatConversionException: f != java.lang.Integer
    at java.base/java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4426)
            at java.base/java.util.Formatter$FormatSpecifier.printFloat(Formatter.java:2951)
            at java.base/java.util.Formatter$FormatSpecifier.print(Formatter.java:2898)
            at java.base/java.util.Formatter.format(Formatter.java:2673)
            at java.base/java.util.Formatter.format(Formatter.java:2609)
            at java.base/java.lang.String.format(String.java:2897)
            at scala.collection.immutable.StringLike.format(StringLike.scala:354)
            at scala.collection.immutable.StringLike.format$(StringLike.scala:353)
            at scala.collection.immutable.StringOps.format(StringOps.scala:33)
            at kafka.utils.ToolsUtils$.$anonfun$printMetrics$3(ToolsUtils.scala:60)
            at kafka.utils.ToolsUtils$.$anonfun$printMetrics$3$adapted(ToolsUtils.scala:58)
            at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
            at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
            at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
            at kafka.utils.ToolsUtils$.printMetrics(ToolsUtils.scala:58)
            at kafka.tools.ConsumerPerformance$.main(ConsumerPerformance.scala:82)
            at kafka.tools.ConsumerPerformance.main(ConsumerPerformance.scala)

Upvotes: 3

Views: 2372

Answers (1)

Paul Robu
Paul Robu

Reputation: 122

  1. https://medium.com/metrosystemsro/apache-kafka-how-to-test-performance-for-clients-configured-with-ssl-encryption-3356d3a0d52b it's a post I've written on this.

Expectation according to this KIP is for rebalance.time.ms and fetch.time.ms to show total rebalance time for the consumer group and the total fetching time excluding the rebalance time. As far I can tell, as of Apache Kafka version 2.6.0 this is still a work in progress and currently the output are timestamps in Unix epoch time. fetch.MB.sec and fetch.nMsg.sec are intended to show the average quantity of messages consumed per second (in MB and as a count)

  1. See https://kafka.apache.org/documentation/#consumer_group_monitoring for the consumer group metrics listed with --print-metrics flag

  2. fetch-latency-avg (the average time taken for a fetch request) will vary but this depends a lot on the test setup.

Upvotes: 1

Related Questions