Kafka with Domain Events

Question

In my event driven project I have messages of type Commands and in response I have Events.

These Commands and Events messages express the domain so they hold complex types from the domain.

Example:

RegisterClientCommand(Name, Email)

ClientRegisteredEvent(ClientId)

There are tens more of these command and event pairs in the domain.

I was thinking of something like:

RawMessage(payloadMap, sequenceId, createdOn)

The payload would hold the message domain class type name and the message fields.

I was also reading about Avro format but seems like a lot of work defining the message format for each message.

What's the best practice in terms of the message format that's actually transmitted through the Kafka brokers?

mjuarez · Accepted Answer

There's no single "best" way to do it, it will all depend on the expertise in your team/organization, and the specific requirements for your project.

Kafka itself is indifferent to what messages actually contain. Most of the time, it just sees message values and keys as opaque byte arrays.

Whatever you end up defining your RawMessage as on the Java side, it will have to be serialized as byte arrays to produce it into Kafka, because that's what KafkaProducer requires. Maybe it's a custom string serializer you already have, maybe you can serialize a POJO to JSON using Jackson or something similar. Or maybe you simply send a huge comma-delimited string as the message. It's completely up to you.

What's important is that the consumer, when they pull the message from the kafka topic, are able to correctly and reliably read the data from each field in the message, without any errors, version conflicts, etc. Most serde/schema mechanisms that exist, like Avro, Protobuf or Thrift, try to make this job easier for you. Especially complex things like making sure new messages are backwards-compatible with previous versions of the same message.

Most people end up with some combination of:
- Serde mechanisms for creating the byte arrays to produce into Kafka, some popular ones are Avro, Protobuf, Thrift.
- Raw JSON strings
- A huge string with some kind of internal/custom format that be parsed/analyzed.
Some companies use a centralized schema service. This is so your data consumers don't have to know ahead of time what schema the message contains, they just pull down the message, and request the corresponding schema from the service. Confluent has their own custom schema registry solution that has supported Avro for years, and as of a few weeks ago, officially supports Protobuf now. This is not required, and if you own the producer/consumer end-to-end, you might decide to handle the serialization by yourself, but a lot of people are used to it.
Depending on the message type, sometimes you want compression because the messages could be very repetitive and/or large, so you'd end up saving quite some storage and bandwidth if you send in compressed messages, at the cost of some CPU usage and latency. This could also be handled by yourself on the producer/consumer side, compressing the byte arrays after they've been serialized, or you can request message compression directly on the producer side (look for compression.type in the Kafka docs).

Kafka with Domain Events

Answers (1)

Related Questions