Sequence of timestamps in Kafka partition

Question

I have a distributed system with Kafka cluster. Kafka topics have replication factor 2 or 3, as suggested for production by docs. The data to one partition can be written from many similar producers.

I know about two ways of tagging messages by timestamp: on producer side and on the broker side (message.timestamp.type). But there is no way to ideally sync the system time between machines (doesn't matter producers or brokers).

If I use message.timestamp.type=LogAppendTime, the data inside the same partition will be ordered by timestamp in the scope of any specific broker, but the consumer reads from 2 or 3 of them.

Summary:

Is there any way to build such a system with many producers and RF > 1, that the data inside single partition (or at least the data under single partition key) will always have increasing (better say not decreasing) timestamps from the consumer point of view?
If the answer to p.1 is "no", does it work at least with offsets? Do the replicas of the same message in different brokers have the same offset value? Should we handle the cases "one broker fail" explicitly in the consumer, if the consumer relies on always increasing offset value for each partition? Or Kafka seamlessly handle it for us?

Eugene Mitskevich · Accepted Answer

About offset.

Yes, they are always monotonically increasing. Every partition has a leader replica, which assigns the offset. The follower brokers replicate the message with already assigned offset. Consumer receives monotonically increasing offsets even in case of leader broker failure.

It's described in Kafka docs 4.7 Replication:

The logs on the followers are identical to the leader's log—all have the same offsets and messages in the same order.

About timestamp in case of message.timestamp.type=LogAppendTime.

I cannot find in Kafka docs, whether the follower broker overrides timestamp or uses the one from leader. However, I found improvement proposal, which was implemented in Kafka 0.10.0.0:

When a leader broker receives a message, the timestamp in the message will be overwritten to current server time. When a follower broker receives a message, the timestamp of this message will be used to build time index.

So, the timestamp is assigned by the leader and is unchanged on the follower. In case of broker failure the former follower starts to assign timestamps based on its local time. Since local times between brokers cannot be ideally in sync, there may be a negative shift in timestamps at the moment of partition leadership transfer.

Sequence of timestamps in Kafka partition

Answers (1)

Related Questions