Reputation: 1469
I want to understand how kafka consumer test works and how to interpret some of numbers reported,
below is the test i ran and the output i got. My questions are
rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
are 1593109326098, -1593108732333, -0.0003, -0.2800
; can you explain how it can report such a high and negative numbers ? they dont make sense to me.Metric Name Value
line is reported due to --print-metrics
flag. What is difference between metrics reported by default and with this flag? how they are calculated and where can i read about what do they mean?consumer-fetch-manager-metrics:fetch-latency-avg
metrics remains almost same. can you explain this? with more consumers pulling data fetch latency should go higher; similarly for given consuming rate if i reduce io and network thread parameters at broker shouldnt latency scale higher?here is the command i ran
[root@oak-clx17 kafka_2.12-2.5.0]# bin/kafka-consumer-perf-test.sh --topic topic_test8_cons_test1 --threads 1 --broker-list clx20:9092 --messages 500000000 --consumer.config config/consumer.properties --print-metrics
and results
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms,fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
WARNING: Exiting before consuming the expected number of messages: timeout (10000 ms) exceeded. You can use the --timeout option to increase the timeout.
2020-06-25 11:22:05:814, 2020-06-25 11:31:59:579, 435640.7686, 733.6922, 446096147, 751300.8463, 1593109326098, -1593108732333, -0.0003, -0.2800
Metric Name Value
consumer-coordinator-metrics:assigned-partitions:{client-id=consumer-perf-consumer-25533-1} : 0.000
consumer-coordinator-metrics:commit-latency-avg:{client-id=consumer-perf-consumer-25533-1} : 2.700
consumer-coordinator-metrics:commit-latency-max:{client-id=consumer-perf-consumer-25533-1} : 4.000
consumer-coordinator-metrics:commit-rate:{client-id=consumer-perf-consumer-25533-1} : 0.230
consumer-coordinator-metrics:commit-total:{client-id=consumer-perf-consumer-25533-1} : 119.000
consumer-coordinator-metrics:failed-rebalance-rate-per-hour:{client-id=consumer-perf-consumer-25533-1} : 0.000
consumer-coordinator-metrics:failed-rebalance-total:{client-id=consumer-perf-consumer-25533-1} : 1.000
consumer-coordinator-metrics:heartbeat-rate:{client-id=consumer-perf-consumer-25533-1} : 0.337
consumer-coordinator-metrics:heartbeat-response-time-max:{client-id=consumer-perf-consumer-25533-1} : 6.000
consumer-coordinator-metrics:heartbeat-total:{client-id=consumer-perf-consumer-25533-1} : 197.000
consumer-coordinator-metrics:join-rate:{client-id=consumer-perf-consumer-25533-1} : 0.000
consumer-coordinator-metrics:join-time-avg:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:join-time-max:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:join-total:{client-id=consumer-perf-consumer-25533-1} : 1.000
consumer-coordinator-metrics:last-heartbeat-seconds-ago:{client-id=consumer-perf-consumer-25533-1} : 2.000
consumer-coordinator-metrics:last-rebalance-seconds-ago:{client-id=consumer-perf-consumer-25533-1} : 593.000
consumer-coordinator-metrics:partition-assigned-latency-avg:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:partition-assigned-latency-max:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:partition-lost-latency-avg:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:partition-lost-latency-max:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:partition-revoked-latency-avg:{client-id=consumer-perf-consumer-25533-1} : 0.000
consumer-coordinator-metrics:partition-revoked-latency-max:{client-id=consumer-perf-consumer-25533-1} : 0.000
consumer-coordinator-metrics:rebalance-latency-avg:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:rebalance-latency-max:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:rebalance-latency-total:{client-id=consumer-perf-consumer-25533-1} : 83.000
consumer-coordinator-metrics:rebalance-rate-per-hour:{client-id=consumer-perf-consumer-25533-1} : 0.000
consumer-coordinator-metrics:rebalance-total:{client-id=consumer-perf-consumer-25533-1} : 1.000
consumer-coordinator-metrics:sync-rate:{client-id=consumer-perf-consumer-25533-1} : 0.000
consumer-coordinator-metrics:sync-time-avg:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:sync-time-max:{client-id=consumer-perf-consumer-25533-1} : NaN
consumer-coordinator-metrics:sync-total:{client-id=consumer-perf-consumer-25533-1} : 1.000
consumer-fetch-manager-metrics:bytes-consumed-rate:{client-id=consumer-perf-consumer-25533-1, topic=topic_test8_cons_test1} : 434828205.989
consumer-fetch-manager-metrics:bytes-consumed-rate:{client-id=consumer-perf-consumer-25533-1} : 434828205.989
consumer-fetch-manager-metrics:bytes-consumed-total:{client-id=consumer-perf-consumer-25533-1, topic=topic_test8_cons_test1} : 460817319851.000
consumer-fetch-manager-metrics:bytes-consumed-total:{client-id=consumer-perf-consumer-25533-1} : 460817319851.000
consumer-fetch-manager-metrics:fetch-latency-avg:{client-id=consumer-perf-consumer-25533-1} : 58.870
consumer-fetch-manager-metrics:fetch-latency-max:{client-id=consumer-perf-consumer-25533-1} : 503.000
consumer-fetch-manager-metrics:fetch-rate:{client-id=consumer-perf-consumer-25533-1} : 48.670
consumer-fetch-manager-metrics:fetch-size-avg:{client-id=consumer-perf-consumer-25533-1, topic=topic_test8_cons_test1} : 9543108.526
consumer-fetch-manager-metrics:fetch-size-avg:{client-id=consumer-perf-consumer-25533-1} : 9543108.526
consumer-fetch-manager-metrics:fetch-size-max:{client-id=consumer-perf-consumer-25533-1, topic=topic_test8_cons_test1} : 11412584.000
consumer-fetch-manager-metrics:fetch-size-max:{client-id=consumer-perf-consumer-25533-1} : 11412584.000
consumer-fetch-manager-metrics:fetch-throttle-time-avg:{client-id=consumer-perf-consumer-25533-1} : 0.000
consumer-fetch-manager-metrics:fetch-throttle-time-max:{client-id=consumer-perf-consumer-25533-1} : 0.000
consumer-fetch-manager-metrics:fetch-total:{client-id=consumer-perf-consumer-25533-1} : 44889.000
Exception in thread "main" java.util.IllegalFormatConversionException: f != java.lang.Integer
at java.base/java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4426)
at java.base/java.util.Formatter$FormatSpecifier.printFloat(Formatter.java:2951)
at java.base/java.util.Formatter$FormatSpecifier.print(Formatter.java:2898)
at java.base/java.util.Formatter.format(Formatter.java:2673)
at java.base/java.util.Formatter.format(Formatter.java:2609)
at java.base/java.lang.String.format(String.java:2897)
at scala.collection.immutable.StringLike.format(StringLike.scala:354)
at scala.collection.immutable.StringLike.format$(StringLike.scala:353)
at scala.collection.immutable.StringOps.format(StringOps.scala:33)
at kafka.utils.ToolsUtils$.$anonfun$printMetrics$3(ToolsUtils.scala:60)
at kafka.utils.ToolsUtils$.$anonfun$printMetrics$3$adapted(ToolsUtils.scala:58)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.utils.ToolsUtils$.printMetrics(ToolsUtils.scala:58)
at kafka.tools.ConsumerPerformance$.main(ConsumerPerformance.scala:82)
at kafka.tools.ConsumerPerformance.main(ConsumerPerformance.scala)
Upvotes: 3
Views: 2372
Reputation: 122
Expectation according to this KIP is for rebalance.time.ms and fetch.time.ms to show total rebalance time for the consumer group and the total fetching time excluding the rebalance time. As far I can tell, as of Apache Kafka version 2.6.0 this is still a work in progress and currently the output are timestamps in Unix epoch time. fetch.MB.sec and fetch.nMsg.sec are intended to show the average quantity of messages consumed per second (in MB and as a count)
See https://kafka.apache.org/documentation/#consumer_group_monitoring for the consumer group metrics listed with --print-metrics flag
fetch-latency-avg (the average time taken for a fetch request) will vary but this depends a lot on the test setup.
Upvotes: 1