Reputation: 41
I have a distributed system with Kafka cluster. Kafka topics have replication factor 2 or 3, as suggested for production by docs. The data to one partition can be written from many similar producers.
I know about two ways of tagging messages by timestamp: on producer side and on the broker side (message.timestamp.type
). But there is no way to ideally sync the system time between machines (doesn't matter producers or brokers).
If I use message.timestamp.type=LogAppendTime
, the data inside the same partition will be ordered by timestamp in the scope of any specific broker, but the consumer reads from 2 or 3 of them.
Summary:
Upvotes: 0
Views: 1493
Reputation: 41
About offset.
Yes, they are always monotonically increasing. Every partition has a leader replica, which assigns the offset. The follower brokers replicate the message with already assigned offset. Consumer receives monotonically increasing offsets even in case of leader broker failure.
It's described in Kafka docs 4.7 Replication:
The logs on the followers are identical to the leader's log—all have the same offsets and messages in the same order.
About timestamp in case of message.timestamp.type=LogAppendTime
.
I cannot find in Kafka docs, whether the follower broker overrides timestamp or uses the one from leader. However, I found improvement proposal, which was implemented in Kafka 0.10.0.0:
When a leader broker receives a message, the timestamp in the message will be overwritten to current server time. When a follower broker receives a message, the timestamp of this message will be used to build time index.
So, the timestamp is assigned by the leader and is unchanged on the follower. In case of broker failure the former follower starts to assign timestamps based on its local time. Since local times between brokers cannot be ideally in sync, there may be a negative shift in timestamps at the moment of partition leadership transfer.
Upvotes: 1