zilcuanu
zilcuanu

Reputation: 3715

Clarifications on Apache Kafka

I have few questions on Apache Kafka.

  1. Can a single partition be assigned to more than one consumer from the same group?
  2. Where is the offset stored? Is it in the partition or at the consumer.
  3. Just like the producer always post the record to the lead partition and the records gets replicated to other partitions, Does Kafka consumer reads the data from the lead partition?
  4. Lets say, that a consumer is reading from a partition and the consumer is running a long process. In this case, the rate at which the producer is updating the partition will be faster than the rate at which the consumer is consuming from the same partition. Is there a way we can speed up the consumption from that partition?
  5. Can we create a checkpoint in the commit log on the partition so that the consumer can start processing from that specific checkpoint? This would be useful, if I want to perform the audit from a specific checkpoint onward?

Upvotes: 0

Views: 197

Answers (1)

Michael Heil
Michael Heil

Reputation: 18475

Can a single partition be assigned to more than one consumer from the same group?

No, one partition can be consumed at most from one consumer within the same consumer group as described here: "This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group."

Where is the offset stored? Is it in the partition or at the consumer.

The offsets for each consumer group is stored in an internal kafka topic called __consumer_offsets as described here: "The coordinator of each group is chosen from the leaders of the internal offsets topic __consumer_offsets, which is used to store committed offsets."

Just like the producer always post the record to the lead partition and the records gets replicated to other partitions, Does Kafka consumer reads the data from the lead partition?

Yes it does. The leader partition is the only "client-facing" partition as described here: "'leader' is the node responsible for all reads and writes for the given partition.".

EDIT:

Is there a way we can speed up the consumption from that partition?

The measure to speed up consumption is to increase the partitions of the topic so you can have more consumer threads reading from that topic and process the data in parallel. At the same time you need to make sure that your data is evenly distributed accross partitions.

Upvotes: 1

Related Questions