Reputation: 13
I got the idea of partitioning in general, but I can't realize how it indeed solves the ordering problem. Taking the Chris Richardson's book example if I have 3 events about a given Order with "shard-key" 1 (Order Created, Order Updated and Order Cancelled). If there's more than one instance per partition how can I ensure the events were processed in order? It's not a downsizing for the same problem?
I mean, in that example all messages goes to the first shard, but won't they round-robin between both instances?
Upvotes: 1
Views: 99
Reputation: 362
Kafka guarantees the order by partition. If you need to guarantee the order of their processing, your message producer must ensure that it sends the message flow to the same partition.
Note however that there cannot be more consumer instances than partitions (per consumer group).
Upvotes: 1
Reputation: 3173
If your records have a key, the default behaviour is for any given key to always be sent to the same partition.
Partitioning is a divide-and-conquer approach, but comes with some sacrifices that may be quite acceptable in any given problem domain. A topic with multiple partitions has no notion of 'order'; as you point out you can have multiple competing consumers which may run at different speeds.
Instead, each partition will only ever be assigned to one consumer in a consumer-group, and it is at this level that ordering is strict(ish). I say strict-ish because things can always go wrong, and records could be reprocessed so your ordering is never absolutely guaranteed by out-of-the-box Kafka.
When you say you need to process things in order, you my need to think how important this is. e.g. you could argue that a bank account's transactions should be processed in order (maybe), so all records for a specific account should be on the same partition, but the relative ordering of two different accounts' activities are unimportant.
With respect to partitioning strategy, up to V2.3, messages without a key will be sent to partitions in round-robin fashion. From v2.4 onwards, KIP-480 introduced a sticky partitioner to round-robin batches of records, instead of strictly one-at-a-time.
Upvotes: 2