Val Bonn
Val Bonn

Reputation: 1199

Apache Kafka Streams and Event Sourcing, CQRS and validation

We have several legacy apps, mainly composed of GUI + services layer + RDMS. Over the time, some batch were added to synchronize/transfer data between the different databases, and so on. The usual spaghetti architecture :)

We are on the way to clean up this mess, and we are designing an architecture based on event sourcing and materialized views:

Step by step, existing apps will have to adopt this architecture. And that's where my concerns arise. How to deal with data validation?

With the legacy apps, when a user updates a data on the UI, the service layer validates (technical checks and business checks) before persisting the new state in the database. (By technical checks I mean check the type of the fields, the length, existence of foreign keys,... and by business checks, things like "if attr_A = xxx then attr_B cannot be null".)

For the new architecture, even if we committed to rely on an event sourcing pattern, I realized I am currently designing something that looks more like a CQRS + Event Sourcing solution :

Service layer > Kafka topic "Commands" > Validation > Kafka topic "Events" > Consuming apps

(With Service layer belongs to the producer app.) In this design, it's important to keep in mind that the "producing app" is also a "consuming app", and the DB of the producing app will only be updated at the end of the cycle.

And I am not sure we are in the right direction. I foresee 2 or 3 different way to go further. None of them is 100% satis:

1. If you continue with this CQRS option

Keeping the "commands" topic:

Service layer > Kafka topic "Commands" > Validation > Kafka topic "Events" > Consuming apps
<PRODUCING APP> <---------------------- STREAMING PLATFORM ----------------><.CONSUM APP.>

I designed the Validation phase to be managed by Kafka Streams app(s). In this case, it would not be too complex to process what I called earlier "technical checks". But I am really not sure the Streaming Platform is the right place to process business checks.

2. If you continue with this CQRS option, without Business validation

Keeping the "commands" topic:

Service layer   >  Kafka topic "Commands" > Kafka topic "Events"   >  Consuming apps
<.PRODUCING APP.> <---------------- STREAMING PLATFORM ------------> <..CONSUM APP..>

We may reach a point where apps may generate invalid events, events that even cannot be stored in its own DB. (By instance, an app may push a command like "Create a new Address" which contains a country code that does not exist in its table "Country".) It's like a paradox: the "event" exists, it is a fact, but this fact is not accepted by its parent.

3. If we use Kafka to store the "Events" topic, not the "Commands":

Without "commands" :

Service layer   >  Kafka topic "Events"   >  Consuming apps

Here again, how to avoid the producing apps to publish invalid events.

What would you suggest?

Regards,

Upvotes: 0

Views: 1064

Answers (1)

CPerson
CPerson

Reputation: 1222

Is there a concept I missed in my design?

There are other questions on SO regarding Kafka Streams and CQRS. I would recommend taking a look to see if Kafka Streams is the right tool for the purposes of providing a transactional event store.

Does it make sense to work with CQRS?

I don't know that CQRS will be the magic bullet to clean the spaghetti code mess you currently have. There is a learning curve associated with CQRS including the one of choosing the right Domain and Aggregate boundaries. If there is no one on your team with expertise in CQRS, then the journey can be quite difficult and could simply introduce a new class of problems that you would have to deal with. Better the devil you know, as they say.

Should the "streaming platform" perform no validation, trust the producing apps and accept all the events in the topic?

Commands should be validated by the domain layer and that validation can be modeled into the domain. But you should also maintain some level of validation when accepting user input. If a field is required, make sure it is not empty when the user provides it, for example.

Once an event is recorded, it is considered fact. The event stream is what tells you the history until now, and you have no choice but to trust it.

Before emitting Commands or Events, should the apps validate the data against the data they already manage?

Typically not. What would the apps validate against? If there are events in the queue that have not been applied to your data source at time of validation, you might incorrectly reject a command or event. Depending on your synchronization guarantees, you might get away with querying your Domain Layer in order to make decisions about which commands to issue next. Typically, your Aggregate or Saga will know enough to make decisions as needed.

Take some time to read through http://www.cqrs.nu. This was helpful for me to establish a baseline understanding of CQRS and Event Sourcing before thinking about the actual implementation.

Cheers and good luck on your exciting journey.

Upvotes: 2

Related Questions