dname
dname

Reputation: 323

Kafka reading all the messages of the topic

I would like to read all the messages from a Kafka topic in a scheduled interval to calculate some global index value. I am doing something like this:

props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
  props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
  props.put("group.id", "test")
  props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
  props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG,Int.MaxValue.toString)

  val consumer = new KafkaConsumer[String, String](props)
  consumer.subscribe(util.Collections.singletonList(TOPIC))
  consumer.poll(10000)
  consumer.seekToBeginning(consumer.assignment())
   val records = consumer.poll(10000)

with this mechanism, I get all the records but is this an efficient way of doing it? It will be around 20000000(2.1 GB) records per topic.

Upvotes: 0

Views: 460

Answers (1)

Nishu Tayal
Nishu Tayal

Reputation: 20880

You might probably consider Kafka Streams library to do this. It supports differrent type of windows.

  1. Tumbling time window
  2. Hopping time window
  3. Sliding time window
  4. Session window

You can use Tumbling windows to capture the events in the given internal and calculate your global index.

https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#windowing

Upvotes: 1

Related Questions