D Developer
D Developer

Reputation: 143

What do we mean by 'commit' data in Kafka broker?

In a Kafka cluster containing N brokers , for Topic T against a partition, producers publish data to Leader broker. By the term 'commit' in Kafka terminology , does it mean the data is committed in Leader broker or the data is committed to the Leader broker and also to the corresponding Followers available in the ISR list.

Upvotes: 8

Views: 5674

Answers (3)

Ankit Sahay
Ankit Sahay

Reputation: 2053

Commit of message means two different things from Kafka's point of view and from Producer point of view.

Because Kafka provides durability guarantees - for Kafka, a message is committed when the leader as well as all the InSyncReplicas have received the message. As example, say a topic is created with RF of 5 (1 leader and 4 followers) and out of those 4 follower replicas, say 2 are InSync. At this point when kafka receives a message, Kafka will consider it committed when the leader and 2 InSyncReplicas get that message.

From producer point of view, the producer application has the flexibility to define when they consider a message to be committed to Kafka.

acks = 0: means producer considers message to be committed without any confirmation (acknowledgement) from Kafka

acks = 1: means producer considers message to be committed with just leader confirming that it got the message

acks = all (default): means producer considers message to be committed when both leader and all the ISR confirms that they got the message.

The difference in the point of view of commit is because Kafka and Producer application might have different priorities. While for Kafka - not loosing a message after it has been received is the priority (and that's why it considers a message committed only when leader and all ISR receives the message); for producer throughput might be the priority and it cannot wait while all the ISRs get the message, so the moment leader gets the message, producer considers it committed safely enough.

Upvotes: 1

asdaraujo
asdaraujo

Reputation: 31

Regardless of the "acks" setting in the producer, from a broker perspective a message is considered "committed" when all in-sync replicas for that partition have applied it to their log.

Only committed messages can be read by consumers.

The "acks" property only tells the producer whether it should wait for the message to be committed (acks=all), written to the leader (acks=1), or not wait at all (acks=0)

Upvotes: 3

Thilo
Thilo

Reputation: 262834

This is controlled by the producer configuration setting called ack:

  • acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.

  • acks=1 (default) This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.

  • acks=all This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee.

Upvotes: 5

Related Questions