When is event sourcing useful on command side

Question

Trying to teach myself some DDD, I'm reading the book Learning Domiain-Driven Design by Vlad Khononov, and I've come to the part of CQRS with Event Sourcing. In the past, I've seen multiple example of DDD, see e.g. Vladimir Khorikov's DDD in Practice repository. The linked page shows an aggregate whose methods all end with a domain event being appended to a list of domain events, which can subsequently be published to any interested parties using a messaging queue.

The great thing about this approach is that if at some stage a new view has to be created (query side), one can simply replay all these events and build the new view. I am here assuming that the message queue stores all events indefinitely, or a subscriber archives all these events such that they can later be replayed.

When introducing event sourcing, these events no longer need to be archived as they are stored in the event storage. This comes at a cost, saving and rehydrating aggregates becomes more cumbersome. What are then the benefits of being able to replay the events on the command side? Is there a scenario where it is more beneficial to replay events at the command side than at the query side?

Edit: Just read that debuggability could be a reason. For instance, if the aggregate contains invalid data, you can simply replay the events one by one and figure out what went wrong. I doubt that the benefit of extra debugging capabilities outweigh the extra hazzle of using an event store.

Levi Ramsey · Accepted Answer

Consider a situation where an aggregate can be held in a process's memory and there is some concurrency control guaranteeing that conflicting commands against that aggregate are not being executed simultaneously.

The primary benefit of this is that we can replace a cycle of

read from datastore
process command 1
write to datastore
read from datastore
process command 2
write to datastore
read from datastore
process command 3
write to datastore

with

read from datastore
process command 1
write to datastore
process command 2
write to datastore
process command 3
write to datastore

We can apply this regardless of the persistence model. It is interesting to note that from one perspective, we've just made our application a domain-specific cache with the datastore serving simply to allow our cache to be durable (Adya, Myers, Qin, and Grandl (2009) refer to this architecture as a LInK).

In the read-modify-write approach, the datastore handles at least as many reads as writes (assuming that there are more than zero commands which end up not updating state (this would include queries encoded as commands)), and therefore will tend to be optimized for fast reads even at the expense of slowing down writes (e.g. indexing may be used, which requires writes to update the index).

In the ...-modify-write approach, conversely, we can have a small number of reads relative to the number of commands processed, while the number of writes is about the same. We therefore will want to do more optimization for writes rather than reads.

In an update-in-place persistence model, we read the entire latest state and (especially in a more key-value approach) update the entire state (it's possible in some of these models to only send fields that changed over the wire, but the datastore will typically expend some effort updating only those fields). In the event-sourced persistence model, we'll typically (in the worst case assuming we're performing some snapshotting) read an entire state as a snapshot and then read some number of events since that snapshot; when writing, we only write something small (assuming that we're modeling the domain more richly than simple CRUD operations) and atomic to be appended.

Here, the event-sourced model, compared to the update-in-place models, can generally be expected to have more expensive (though generally less frequent) reads and somewhat less expensive writes. Imagine that in our "cache-but-durable" example, the update-in-place read takes 20ms and the writes take 20ms while the event-sourced read takes 40ms, but the writes take 8ms; command processing itself is sub-1ms. These times are accurate relative to each other, in my experience.

Then processing 4 commands update-in-place takes 20 + 4x20 = 100ms (a mean of 25ms per command) And processing 4 commands event-sourced takes 40 + 4x8 = 72ms (a mean of 18ms per command)

NB: in distributed systems latency measurement, the mean is perhaps-surprisingly effective at characterizing latency, largely because it's disproportionately affected by outliers (and negative latency is impossible).

One way to provide the concurrency control I mentioned is to associate a queue of incoming commands for an aggregate with that aggregate. Instead of directly calling a method on an aggregate to process a command, you put the command (along with a means of dealing with the result of the command, e.g. perhaps a callback or another queue) into the queue; meanwhile the aggregate is pulling commands off the queue. This is sometimes called the "active object" pattern in OO, while others may recognize it as the actor model.

There are a number of frameworks and toolkits which provide or support the actor model: Erlang/OTP, Microsoft Orleans, Thespian (Python), Akka, Akka.Net, and ProtoActor are some notable implementations.

Disclaimer: I am employed by Lightbend, which maintains and offers professional services/support around one of the frameworks mentioned above.

When is event sourcing useful on command side

Answers (1)

Related Questions