Reputation: 323
I would like to read all the messages from a Kafka topic in a scheduled interval to calculate some global index value. I am doing something like this:
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("group.id", "test")
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG,Int.MaxValue.toString)
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(util.Collections.singletonList(TOPIC))
consumer.poll(10000)
consumer.seekToBeginning(consumer.assignment())
val records = consumer.poll(10000)
with this mechanism, I get all the records but is this an efficient way of doing it? It will be around 20000000(2.1 GB) records per topic.
Upvotes: 0
Views: 460
Reputation: 20880
You might probably consider Kafka Streams library to do this. It supports differrent type of windows.
You can use Tumbling windows to capture the events in the given internal and calculate your global index.
https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#windowing
Upvotes: 1