anfab
anfab

Reputation: 1636

Message ordering and kafka alternatives

The number of messages passed are not large in number but there is need for strict ordering of messages on an entity. For example we may have a million messages but on 200K entities. If a message for an entity fails it subsequent messages should not be consumed but messages for other entities can be consumed.

With Kafka we get the ordering on an partition, with the limitation that if a message in a partition is not consumed then all the subsequent messages will be blocked, even if they belong to another entity. We can increase the number of partitions but this has a limit.

What are the generic patterns for solving these class of problems?

Upvotes: 0

Views: 757

Answers (1)

oh54
oh54

Reputation: 498

I hope I understand the question correctly in that you want to ensure that messages for a certain entity go to the same partition while still having a scaleable solution.

The easiest way (in my opinion) to do this would be to specify the partition on the producer side.

new ProducerRecord(topicName, partitionId,messageKey,message)

If the specific topic in question comes from outside of your system and you can't thus create your own producer logic, I would just add a consumer which produces the messages to another topic so that the partition is specified.

Continuing with your example, let's say you have some_topic with million messages and 200k entities, you could have a high-throughput consumer which consumes everything and produces to some_topic_2 so that a message for a certain entity is always produced to the same partition.

Then you could use another high-throughput consumer which consumes from some_topic_2 and would do the logic you described, i.e. keeping tabs on which entities should be ignored and processing the other ones.

Of course if you don't have the need for a high-throughput system you could use a kafka topic with a single partition and do all the processing using a single consumer for that topic instead.

Relevant blogpost: http://www.javaworld.com/article/3066873/big-data/big-data-messaging-with-kafka-part-2.html

Additional thoughts:

Another way to do this if you're using at least kafka 0.10 should be using Kafka Streams (http://kafka.apache.org/documentation/streams).

[...] being able to maintain state opens up many possibilities for sophisticated stream processing applications: you can join input streams, or group and aggregate data records.

I unfortunately have not worked with Kafka Streams API yet so I cannot specify an approach.

Hopefully other answerers can provide some additional insight.

Upvotes: 1

Related Questions