Apache Kafka Streams and Event Sourcing, CQRS and validation

Question

We have several legacy apps, mainly composed of GUI + services layer + RDMS. Over the time, some batch were added to synchronize/transfer data between the different databases, and so on. The usual spaghetti architecture :)

We are on the way to clean up this mess, and we are designing an architecture based on event sourcing and materialized views:

An event log based on Kafka and Kafka Streams
The events are streamed to the consuming apps, that uses/stores data the way they want

Step by step, existing apps will have to adopt this architecture. And that's where my concerns arise. How to deal with data validation?

With the legacy apps, when a user updates a data on the UI, the service layer validates (technical checks and business checks) before persisting the new state in the database. (By technical checks I mean check the type of the fields, the length, existence of foreign keys,... and by business checks, things like "if attr_A = xxx then attr_B cannot be null".)

For the new architecture, even if we committed to rely on an event sourcing pattern, I realized I am currently designing something that looks more like a CQRS + Event Sourcing solution :

Service layer > Kafka topic "Commands" > Validation > Kafka topic "Events" > Consuming apps

(With Service layer belongs to the producer app.) In this design, it's important to keep in mind that the "producing app" is also a "consuming app", and the DB of the producing app will only be updated at the end of the cycle.

And I am not sure we are in the right direction. I foresee 2 or 3 different way to go further. None of them is 100% satis:

1. If you continue with this CQRS option

Keeping the "commands" topic:

Service layer > Kafka topic "Commands" > Validation > Kafka topic "Events" > Consuming apps
 <---------------------- STREAMING PLATFORM ----------------><.CONSUM APP.>

I designed the Validation phase to be managed by Kafka Streams app(s). In this case, it would not be too complex to process what I called earlier "technical checks". But I am really not sure the Streaming Platform is the right place to process business checks.

2. If you continue with this CQRS option, without Business validation

Keeping the "commands" topic:

Service layer   >  Kafka topic "Commands" > Kafka topic "Events"   >  Consuming apps
<.PRODUCING APP.> <---------------- STREAMING PLATFORM ------------> <..CONSUM APP..>

We may reach a point where apps may generate invalid events, events that even cannot be stored in its own DB. (By instance, an app may push a command like "Create a new Address" which contains a country code that does not exist in its table "Country".) It's like a paradox: the "event" exists, it is a fact, but this fact is not accepted by its parent.

3. If we use Kafka to store the "Events" topic, not the "Commands":

Without "commands" :

Service layer   >  Kafka topic "Events"   >  Consuming apps

Here again, how to avoid the producing apps to publish invalid events.

What would you suggest?

Is there a concept I missed in my design?
Does it make sense to work with CQRS?
Should the "streaming platform" perform no validation, trust the producing apps and accept all the events in the topic?
Before emitting Commands or Events, should the apps validate the data against the data they already manage?

Regards,

Apache Kafka Streams and Event Sourcing, CQRS and validation

Answers (1)

Related Questions