Reputation: 3967
I am using following code to write to Kafka:
String partitionKey = "" + System.currentTimeMillis();
KeyedMessage<String, String> data = new KeyedMessage<String, String>(topic, partitionKey, payload);
And we are using 0.8.1.1 version of Kafka.
Is it possible that when multiple threads are writing, some of them (with different payload) write with same partition key and because of that Kafka overwrites these messages (due to same partitionKey)?
The documentation that got us thinking in this direction is: http://kafka.apache.org/documentation.html#compaction
Upvotes: 2
Views: 7453
Reputation: 3967
I found some more material at https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction
Salient points:
So whether we have log compaction or not, it follows that Kafka deletes older records but the records in the head of the log are safe from that.
Missing records problem will occur only when downstream clients are unable to empty Kafka queues for a very long time (such that per topic size/time limit is hit).
This should be an expected behavior I think since we cannot keep records forever. They have to be deleted some time or the other.
Upvotes: 6
Reputation: 5168
Sounds very possible. Compaction saves the last message for each key. If you have multiple messages sharing a key, only the last one will be saved after compaction. The normal use-case is database replication where only the latest state is interesting.
Upvotes: 3